The AI middleware layer — the LiteLLM proxy, LangChain/LangGraph orchestration, and Langfuse observability that sit between your application and the model providers — has quietly become the soft underbelly of the enterprise AI stack. A wave of 2025–2026 CVEs proves it: one chain already reaches CVSS 10.0 and has been added to the CISA Known Exploited Vulnerabilities catalog, with confirmed exploitation in the wild. These are widely adopted, capable open-source projects — the issue is not the maintainers, it is the architectural posture of running unauthenticated, trust-everything middleware in front of your model providers and your secrets.
Most teams adopted this stack for convenience: one base URL to route across OpenAI, Anthropic, Bedrock, Azure, and Vertex; one framework to chain prompts and tools; one dashboard to watch tokens and latency. The convenience is real. But the security model underneath it was never designed for a world where a poisoned model response, a manipulated Host header, or a single error path can hand an attacker remote code execution or a wallet full of provider keys.
unauthenticated RCE
"LangGrinch"
June 9, 2026
LiteLLM: From AI Gateway to Unauthenticated RCE
LiteLLM is the most widely deployed open-source AI gateway/proxy — a single OpenAI-compatible endpoint that fans out to dozens of providers. That central position is exactly why a flaw in it is catastrophic: it sits in the path of every model call, and it holds every provider key. In 2026 it accumulated a set of vulnerabilities that, taken together, turn the gateway itself into the breach.
Individually these are serious. Chained, they are terminal. Researchers at Horizon3.ai confirmed that combining CVE-2026-42271 with CVE-2026-48710 yields a CVSS 10.0, zero-credential, unauthenticated remote code execution path on vulnerable LiteLLM deployments. The exploit chain reads like a checklist of everything an AI gateway should never allow:
The fix arrived in LiteLLM 1.83.7 (May 8, 2026), which added authorization controls and updated Starlette. But patching one chain does not change the posture, and 2026 was not LiteLLM's first incident.
The March 2026 supply-chain compromise
In March 2026, attackers slipped malicious code into official LiteLLM releases on PyPI. The payload was a credential stealer — designed to exfiltrate secrets across cloud environments, CI/CD pipelines, and developer machines that installed the poisoned package. Trend Micro, Snyk, and HeroDevs all reported on the compromise. For a package that, by design, is wired into the most secret-rich path in your infrastructure, a single tainted release is a direct line to your provider keys and cloud credentials.
proxy_server.py (v1.52.1), an error while parsing team settings exposes langfuse_secret and langfuse_public_key. A failure in the gateway's config handling silently burns the credentials of the observability layer bolted to it — the first sign of just how tightly coupled, and how mutually dependent for failure, these projects are.None of this means LiteLLM is badly built — it is a capable, popular gateway. It means an unauthenticated, secret-holding, command-spawning proxy at the center of your AI stack is a structurally dangerous place to keep convenience. When the gateway is breached, everything behind it is breached. The fix is not "patch and pray for the next CVE"; it is to put a governed, authenticated, isolated router in that position instead.
LangChain: Prompt Injection That Becomes Code Execution
LangChain (and LangGraph) is an orchestration library, not a gateway — it lives inside your application process, chaining prompts, tools, and state. That makes its vulnerabilities a different shape: instead of an external network attacker, the threat rides in on the data your application already trusts — specifically, model output.
dumps()/dumpd()) do not escape free-form dictionaries containing lc keys. As a result, user-controlled data is later deserialized as a trusted LangChain object. Impact: extraction of secrets from the environment (when secrets_from_env=True), instantiation of arbitrary classes in trusted namespaces (langchain_core, langchain, langchain_community), and remote code execution via Jinja2 templates. Reported December 4, 2025 by Yarden Porat.The vector is what makes LangGrinch so dangerous in agentic systems. The primary entry point is the LLM response itself — fields like additional_kwargs and response_metadata — which are controlled via prompt injection and then serialized and deserialized during streaming. The chain looks like this:
additional_kwargs / response_metadata.dumps()/dumpd() and back. Because lc-keyed dicts are not escaped, the poisoned payload is reconstituted as a trusted LangChain object.The patches are the right shape: a new allowed_objects allowlist in load()/loads(), Jinja2 blocked by default, and secrets_from_env now defaulting to False. But the lesson stands — in an agentic system, model output is untrusted input, and any layer that deserializes it as trusted is one prompt-injection away from code execution. That defense belongs in front of orchestration, not inside it.
Langfuse & the Coupling Problem
Langfuse is the observability layer many of these stacks bolt on — traces, token counts, latency, prompt history. The problem is not a single dramatic RCE; it is coupling. Observability is only as secure as the gateway it watches, and the gateway is only as secure as its worst error path.
CVE-2025-0330 above is the clearest illustration: a flaw in the LiteLLM gateway leaks the Langfuse public and secret keys through an error path. A bug in one product burns the credentials of the adjacent one. When you bolt observability onto an insecure gateway, the observability layer inherits the gateway's blast radius — and its keys.
The deployment surface adds to it. Self-hosted Langfuse Docker images shipped with vulnerable Node/npm packages; the issue was addressed in release 3.143.0. The broader pattern is unmistakable: the adjacent low-code AI tool Langflow — a different product, not to be confused with Langfuse — also had a vulnerability added to the CISA KEV catalog in May 2026. Across the AI middleware ecosystem, unsecured tooling is being actively exploited.
Three products, three trust boundaries that don't actually exist. The gateway holds the observability keys. The orchestration library trusts model output. The observability layer trusts the gateway. A breach anywhere is a breach everywhere. Decoupling these layers — with authentication, isolation, and a vaulted secret store between each — is the only thing that turns one compromise into a contained one.
Thorough Comparison
Here is a fair, factual comparison across the four layers. LangChain is an orchestration library, not a gateway, so several gateway-specific rows are marked N/A for it — that is not a knock, it is a category difference. The point is the security posture each layer brings by default.
| Capability | LiteLLM proxy | LangChain | Langfuse | RuntimeAI |
|---|---|---|---|---|
| Authentication model | API key; bypassable via Host-header (CVE-2026-48710) | N/A (in-process library) | Public/secret key pair; leaked via gateway error path | Mandatory, identity-bound JWT + tenant context; Bot-CA mTLS for agents |
| Supply-chain integrity | PyPI compromise (Mar 2026) shipped credential stealer | Multiple CVEs in core + community packages | Vulnerable Node/npm in Docker images (< 3.143.0) | Signed builds; PQC-enveloped secrets never in plaintext for a stealer to grab |
| MCP handling | Test endpoint spawns arbitrary commands (CVE-2026-42271) | Delegates to host MCP client | N/A | Governed MCP Gateway: tenant ACLs, validation, audit, mTLS; no arbitrary-command path |
| Secret storage | Provider keys in plaintext env; leaked on error | Reads secrets from env (secrets_from_env) |
Keys leaked by coupled gateway | QuantumVault / PQ TokenVault: PQC-enveloped, short-lived, minted on demand |
| Multi-tenancy / isolation | Shared proxy; broad blast radius | In-process; app-defined | Project-scoped; shared infra | Multi-tenant RLS enforced at the data layer |
| DLP & prompt-injection defense | None by default | None — trusts model output (LangGrinch) | None (observe-only) | Bidirectional AI Firewall: inbound/outbound scan, PII redaction, injection detection |
| Audit trail | Request logs | Application-defined callbacks | Trace store (plaintext key coupling) | Tamper-proof Audit Black Box: Merkle-chained, PQ-Sign timestamped |
| Kill switch | None native | N/A | N/A | Instant L1/L2/L3 revocation of provider, agent, or credential |
| On-prem / sovereign | Self-host proxy | Self-host library | Self-host (vulnerable image history) | First-class routing to on-prem Sovereign LLM Platform (3 tiers), air-gap capable |
| Cost governance | Basic cost tracking | N/A | Observe-only token/latency | Real-time tracking + budget enforcement + policy-driven routing |
| Failover / resilience | Provider fallbacks | N/A | N/A | Retry with backoff on provider errors |
And on the feature axis people actually adopt LiteLLM for — multi-provider routing and cost control — RuntimeAI's Secure LLM Router matches the convenience and adds the governance that was missing:
| Routing / cost / failover feature | LiteLLM proxy | RuntimeAI Secure LLM Router |
|---|---|---|
| Multi-provider routing | Yes (OpenAI, Anthropic, Bedrock, Azure, Vertex, vLLM/Ollama) | Yes — 17 wired providers + self-hosted vLLM/Ollama, one OpenAI-compatible base URL (full list below) |
| Routing strategy | Cost / latency / load | Policy-driven (OPA/Rego), compliance- & latency-aware |
| Cost tracking | Yes | Real-time + budget enforcement |
| On-prem / sovereign target | Self-host models | First-class routing to Sovereign LLM Platform (air-gapped/regulated) |
| Auth & isolation | API key (bypassable) | JWT + tenant context + Bot-CA mTLS; multi-tenant RLS |
How the Router Handles a New Model Drop
A fair question when you put a gateway in your critical path: what happens the day OpenAI or Anthropic ships a new model? With many setups the answer is “wait for an SDK bump or a code change.” With the Secure LLM Router, the answer is nothing.
The router is a governed passthrough. The model name in your request is forwarded to the provider you configured, so a new model from a provider you already use works the moment that provider exposes it — no redeploy, no SDK upgrade. Three governance layers wrap that passthrough without getting in its way:
- Policy gate — per-tenant and per-agent allow/deny lists decide which models are usable. Default-open, so new models flow through; lock specific models down (or pin an allowlist) when governance requires it.
- Cost attribution — register the new model’s price so spend and budgets stay accurate. Routing works either way; this just keeps the ledger complete.
- Model-deprecation policy — schedule retirement of an old model with automatic migration to its replacement, so you are never pinned to a model a provider is sunsetting.
Wired providers today: OpenAI, Anthropic, Azure OpenAI, Google Gemini & Vertex, Mistral, Cohere, Groq, Together, Fireworks, Perplexity, DeepSeek, xAI, Cerebras, SambaNova, OpenRouter, and AI21 — plus any OpenAI-compatible self-hosted endpoint (vLLM, Ollama, TGI) and RuntimeAI’s on-prem Sovereign LLM Platform. Providers with non-standard auth get a dedicated adapter; OpenAI-compatible providers route through the unified path. The honest version of “works with everything” is this: any model from a provider you’ve connected works without a code change — and connecting a new OpenAI-compatible provider is a config entry, not a release.
It's Not Just These Three — Map for the Whole Middleware Layer
LiteLLM, LangChain/LangGraph, and Langfuse are the most visible names, but the same governance gap runs through the entire AI-middleware category. Whatever you run today, there is a governed RuntimeAI layer to swap it for — one at a time:
| Middleware category | Common tools | RuntimeAI layer to switch to |
|---|---|---|
| AI gateway / model router | LiteLLM, Portkey, Kong AI Gateway, Cloudflare AI Gateway, OpenRouter, Bifrost | Secure LLM Router |
| Agent orchestration | LangChain, LangGraph, LlamaIndex | Keep it — front it with AI Firewall + PII Shield + Flow Enforcer |
| Observability / tracing | Langfuse, Helicone, LangSmith, Arize Phoenix | Agent Observability + Audit Black Box (tamper-proof) |
| Guardrails / safety | Guardrails AI, NeMo Guardrails, Lakera | AI Firewall (bidirectional DLP, prompt-injection, content policy) |
| MCP / tool routing | LiteLLM MCP, raw MCP servers | MCP Gateway (Bot-CA mTLS, tenant ACLs, audit) |
| LLM key storage | .env files, plaintext config, k8s secrets | QuantumVault / PQ TokenVault (PQC-enveloped, short-lived) |
| Agent memory | Ad-hoc vector stores | Memory Vault (governed writes, injection detection, TTL) |
You Don't Have to Rip It Out — Switch One Layer at a Time
This is the part that matters most. You do not have to do a forklift migration to fix this. The AI middleware stack is modular, and so is the fix. Keep your application, your prompts, and your agents exactly as they are — and swap out the risky layer underneath them. Adopt a single RuntimeAI sub-product to close one specific CVE class, or adopt the whole platform for defence-in-depth. The migration is incremental and reversible.
The smallest possible change is a base-URL swap. The Secure LLM Router is OpenAI-compatible, so most apps move with one environment variable — and the provider key stops living in plaintext:
# Before — LiteLLM proxy: shared key in env, unauthenticated control plane
OPENAI_API_BASE=http://litellm-proxy:4000/v1
OPENAI_API_KEY=sk-litellm-master-key # long-lived, sits in .env
# After — RuntimeAI Secure LLM Router (drop-in, OpenAI-compatible)
OPENAI_API_BASE=https://router.runtimeai.io/v1
OPENAI_API_KEY=${RUNTIMEAI_TOKEN} # short-lived JWT, minted by QuantumVault
# every call now carries tenant context + Bot-CA mTLS; your routing
# and cost policy are unchanged — only the security model is.
additional_kwargs / response_metadata field — is caught at the egress/ingress boundary, before it is reconstituted as a trusted object.Each card above is independently deployable. Worried only about the CVSS 10.0 RCE? Start with the Secure LLM Router and MCP Gateway. Worried about LangGrinch and prompt injection? Front your existing LangChain with the AI Firewall + Flow Enforcer. Want the whole defence-in-depth posture? Take the platform. The migration is modular by design — no rip-and-replace required.
RuntimeAI Take
Most Advanced AI Security Zero Trust · Defence in Depth
The middleware CVEs of 2025–2026 are not a string of unlucky bugs — they are the predictable result of running trust-everything convenience layers in the most sensitive position in the stack. RuntimeAI's answer is four independent control layers, each of which would have blunted these specific CVE classes:
- Discovery / Inventory. Every model call, MCP server, and agent is registered and identity-bound through the Secure LLM Router and MCP Gateway. There is no anonymous caller and no shadow MCP test endpoint — the Host-header bypass and command-injection surface (CVE-2026-48710 / 42271) have nothing unauthenticated to reach.
- Behavioural enforcement. Flow Enforcer applies OPA/Rego policy to every tool call and route. Shell-spawning configs, unsanitized parameters, and out-of-policy provider routes are rejected before execution — the exact decision LiteLLM's test endpoint never made.
- Flow / egress control. The bidirectional AI Firewall + PII Shield scan inbound and outbound content, redact PII, and detect injection patterns. A poisoned LLM response field — the LangGrinch (CVE-2025-68664) vector — is caught before it reaches a vulnerable deserializer, and provider keys live in QuantumVault, not plaintext env, so a stealer or error path (CVE-2025-0330, the supply-chain payload) finds nothing to exfiltrate.
- Immutable audit trail. The Audit Black Box records every call, route, and tool invocation in a Merkle-chained, PQ-Sign-timestamped log — the secure replacement for Langfuse-style observability, with no plaintext-key coupling. If a provider, agent, or credential is compromised, the Kill Switch revokes it instantly at L1/L2/L3.
Your AI gateway sits in the most credential-rich, most central path in your stack. When it is unauthenticated, secret-holding, and able to spawn commands, a single CVE becomes a CVSS 10.0 breach. The fix is not the next patch — it is a governed, authenticated, isolated router with DLP, a vaulted secret store, and a tamper-proof audit trail. Switch the risky layer, keep everything else, and the breach stops being yours to own.
Your AI middleware shouldn't be the breach.
RuntimeAI's Secure LLM Router, AI Firewall, MCP Gateway, and Audit Black Box let you switch the risky layer — one sub-product at a time — without ripping out your app. Get the AI Security Weekly briefing for the CVEs that matter.
- LiteLLM RCE Vulnerability Exploited — Cybersecurity News
- Inside the LiteLLM Supply-Chain Compromise — Trend Micro Research
- LiteLLM Vulnerability CVE-2026-42271 — CybelAngel
- Poisoned Security Scanner Backdooring LiteLLM — Snyk
- Critical LangChain Core Vulnerability — The Hacker News
- LangGrinch — LangChain Core CVE-2025-68664 — Cyata
- CVE-2025-68664 — NVD
- LangChain / LangGraph Flaws Expose Files — The Hacker News
- GHSA-879v-fggm-vxw2 — GitHub Advisory