Gartner rated the most popular AI agent an ‘unacceptable cybersecurity risk.’ Cisco found its third-party skills performing silent data exfiltration. Security researchers identified over 500 vulnerabilities, 40,000 exposed instances, and a supply chain campaign that poisoned 20% of its plugin registry.

This is the predictable result of a fundamental architectural flaw.

The problem: ambient authority

AI agents today operate under the full ambient authority of the user running them. When you give an agent access to your Gmail, it gets your Gmail. All of it, with your full permissions. There is no concept of scoped access or sub-identity that limits what the agent can do.

This is not really the fault of the agent frameworks. It is how the internet was built. Service providers like Google, Slack, and Microsoft do not make it easy to create programmatic sub-identities with limited permissions. OAuth scopes are coarse. Most APIs have no concept of “this token is for an AI agent that should only do a subset of what the user can do.”

So we are left with agents that fully proxy the authority of the user on whose behalf they are running. The agent holds your credentials, processes untrusted input from emails and web pages, and executes arbitrary code, all in the same trust domain. A single prompt injection from a malicious email can cause the agent to exfiltrate your data, and the agent has every capability it needs to do so.

That is the problem we are trying to solve.

A research prototype

I wanted to pursue a thought experiment: what would a personal agent look like if security were baked in from the start? IronCurtain is my research prototype for that question.

The core idea is simple. The agent does not get to execute code on your system. Instead, the agent writes TypeScript in a V8 isolated VM. That TypeScript is allowed to issue function calls that get translated into MCP tool calls. Every tool call passes through a trusted process that acts as a policy engine, deciding whether each call should be allowed, denied, or escalated to a human for approval.

Because every tool call carries semantic context (gmail.sendEmail({to: "bob@example.com", subject: "..."}) rather than a raw HTTP request), we can write meaningful policy against it. We can ask “is this recipient in the user’s contacts?” in a way that would be impossible if the agent were just making opaque web_fetch calls.

Policy in plain English

Writing good security policy is hard for humans. DSLs like OPA/Rego are powerful but inaccessible. JSON schema policies are verbose and error-prone. Most people should not have to think in allowlists and regex patterns.

So I thought: wouldn’t it be great if a human could write a rough constitution for their agent in plain English? Something like: “The agent may read all my email. It may send email to people in my contacts without asking. For anyone else, ask me first. Never delete anything permanently.”

IronCurtain compiles this English constitution into enforceable policy through a multi-step process. A compiler LLM translates the English into per-interface rules using a library of verified policy primitives. A test scenario generator creates cases designed to find gaps and contradictions. A verifier checks that the compiled rules match the original intent. A judge iteratively refines the policy until it meets the spirit of the constitution as well as it can.

The constitution is not perfect and never will be. But it evolves. When the agent hits an edge case and escalates to the human, that decision can feed back into the constitution over time as IronCurtain keeps an audit log of all policy decisions.

Per-task least privilege

At the moment, IronCurtain applies a single compiled constitution that governs all agent tasks. The planned next step is to infer a task-specific policy for each task that further limits what the agent is allowed to do. If the user says “organize my documents folder by topic,” the system should determine that the agent needs filesystem read, list, mkdir, and move, but has no reason to access email, Slack, or the web.

This task policy would be a strict subset of what the constitution permits. The agent gets exactly the capabilities it needs for the current job.

Credentials stay out of reach

Because of how this is built (TypeScript in a sandbox that can only make function calls, those function calls proxied into MCP tool calls), the agent never sees any credentials. OAuth tokens, API keys, and service account secrets live exclusively in the MCP servers and the trusted process. There is no way for the agent to read, access, modify, or inject credentials. It does not even know they exist.

We also use Anthropic’s sandbox runtime to isolate the MCP servers themselves, restricting their filesystem and network access. Even if an MCP server has a vulnerability, the blast radius is contained.

What we deliberately do not do

We do not try to prevent the LLM from going rogue. Prompt injection is an unsolved problem. Multi-turn drift is subtle and hard to detect. IronCurtain assumes the LLM will be compromised or confused and constrains the consequences through architecture rather than prevention.
We do not use containers or heavyweight sandboxing. The threat model is “the LLM generates malicious tool calls,” not “the agent escapes a kernel sandbox.” A V8 isolate is sufficient. Adding gVisor or Docker would increase complexity without improving security for our threat model.

Ideas to explore

There are ideas worth investigating that IronCurtain does not yet implement.

Input classification and taint tracking. We might want to automatically detect PII in content the agent processes and track where it flows. Tainting and taint propagation are hard problems though. Once data enters the LLM’s context and gets summarized or restructured, tracking provenance becomes extremely difficult.
Intelligibility constraints. We might want to require that when the agent communicates with the outside world, it does so in a way that is completely intelligible, interpretable, and reviewable. If an outbound message contains encoded, obfuscated, or nonsensical content, that is suspicious regardless of what the policy says.
Richer identity systems. The real long-term fix for ambient authority is for service providers to support scoped agent identities natively. Until that happens, runtimes like IronCurtain are a necessary bridge.

The point

Can we build a personal AI assistant that has high utility but is not going to create Skynet for us? I don’t really want to live in the timeline where we gave agents the keys to everything and hoped for the best. IronCurtain is my attempt to make sure that doesn’t happen.