Gartner rated the most popular AI agent an ‘unacceptable cybersecurity
risk.’ Cisco found its third-party skills performing silent data
exfiltration. Security researchers identified over 500 vulnerabilities,
40,000 exposed instances, and a supply chain campaign that poisoned 20% of
its plugin registry.
This is the predictable result of a fundamental architectural flaw.
> The problem: ambient authority
AI agents today operate under the full ambient authority of the user
running them. When you give an agent access to your Gmail, it gets your Gmail. All of it, with your full permissions. There is no concept of scoped
access or sub-identity that limits what the agent can do.
This is not really the fault of the agent frameworks. It is how the
internet was built. Service providers like Google, Slack, and Microsoft do
not make it easy to create programmatic sub-identities with limited
permissions. OAuth scopes are coarse. Most APIs have no concept of “this
token is for an AI agent that should only do a subset of what the user can
do.”
So we are left with agents that fully proxy the authority of the user on
whose behalf they are running. The agent holds your credentials, processes
untrusted input from emails and web pages, and executes arbitrary code,
all in the same trust domain. A single prompt injection from a malicious
email can cause the agent to exfiltrate your data, and the agent has every
capability it needs to do so.
That is the problem we are trying to solve.
> A research prototype
I wanted to pursue a thought experiment: what would a personal agent look
like if security were baked in from the start? IronCurtain is my research
prototype for that question.
The core idea is simple. The agent does not get to execute code on your
system. Instead, the agent writes TypeScript in a V8 isolated VM. That
TypeScript is allowed to issue function calls that get translated into MCP
tool calls. Every tool call passes through a trusted process that acts as
a policy engine, deciding whether each call should be allowed, denied, or
escalated to a human for approval.
Because every tool call carries semantic context (gmail.sendEmail({to: "bob@example.com", subject: "..."}) rather than a raw HTTP request), we can write meaningful policy against it.
We can ask “is this recipient in the user’s contacts?” in a way that would be
impossible if the agent were just making opaque web_fetch calls.
> Policy in plain English
Writing good security policy is hard for humans. DSLs like OPA/Rego are
powerful but inaccessible. JSON schema policies are verbose and
error-prone. Most people should not have to think in allowlists and regex
patterns.
So I thought: wouldn’t it be great if a human could write a rough
constitution for their agent in plain English? Something like: “The agent
may read all my email. It may send email to people in my contacts without
asking. For anyone else, ask me first. Never delete anything permanently.”
IronCurtain compiles this English constitution into enforceable policy
through a multi-step process. A compiler LLM translates the English into
per-interface rules using a library of verified policy primitives. A test
scenario generator creates cases designed to find gaps and contradictions.
A verifier checks that the compiled rules match the original intent. A
judge iteratively refines the policy until it meets the spirit of the
constitution as well as it can.
The constitution is not perfect and never will be. But it evolves. When
the agent hits an edge case and escalates to the human, that decision can
feed back into the constitution over time as IronCurtain keeps an audit
log of all policy decisions.
> Per-task least privilege
At the moment, IronCurtain applies a single compiled constitution that
governs all agent tasks. The planned next step is to infer a task-specific
policy for each task that further limits what the agent is allowed to do.
If the user says “organize my documents folder by topic,” the system
should determine that the agent needs filesystem read, list, mkdir, and
move, but has no reason to access email, Slack, or the web.
This task policy would be a strict subset of what the constitution
permits. The agent gets exactly the capabilities it needs for the current
job.
> Credentials stay out of reach
Because of how this is built (TypeScript in a sandbox that can only make
function calls, those function calls proxied into MCP tool calls), the
agent never sees any credentials. OAuth tokens, API keys, and service
account secrets live exclusively in the MCP servers and the trusted
process. There is no way for the agent to read, access, modify, or inject
credentials. It does not even know they exist.
We also use Anthropic’s sandbox runtime to isolate the MCP servers
themselves, restricting their filesystem and network access. Even if an
MCP server has a vulnerability, the blast radius is contained.
> What we deliberately do not do
We do not try to prevent the LLM from going rogue. Prompt injection
is an unsolved problem. Multi-turn drift is subtle and hard to detect. IronCurtain
assumes the LLM will be compromised or confused and constrains the consequences
through architecture rather than prevention.
We do not use containers or heavyweight sandboxing. The threat
model is “the LLM generates malicious tool calls,” not “the agent escapes a
kernel sandbox.” A V8 isolate is sufficient. Adding gVisor or Docker would increase
complexity without improving security for our threat model.
> Ideas to explore
There are ideas worth investigating that IronCurtain does not yet
implement.
Input classification and taint tracking. We might want to automatically
detect PII in content the agent processes and track where it flows. Tainting
and taint propagation are hard problems though. Once data enters the LLM’s context
and gets summarized or restructured, tracking provenance becomes extremely difficult.
Intelligibility constraints. We might want to require that
when the agent communicates with the outside world, it does so in a way that
is completely intelligible, interpretable, and reviewable. If an outbound message
contains encoded, obfuscated, or nonsensical content, that is suspicious regardless
of what the policy says.
Richer identity systems. The real long-term fix for ambient
authority is for service providers to support scoped agent identities natively.
Until that happens, runtimes like IronCurtain are a necessary bridge.
> The point
Can we build a personal AI assistant that has high utility but is not
going to create Skynet for us? I don’t really want to live in the timeline
where we gave agents the keys to everything and hoped for the best.
IronCurtain is my attempt to make sure that doesn’t happen.