Researcher to researcher

You pioneered MCP and Skills. Safebox is the next move.

A letter to the people inside Anthropic who would care about this work. We've spent fifteen years building deployment infrastructure for organizations — community-owned platforms, federated identity, signed actions, multi-party governance, audit trails. The pieces interlock into Safebox: a small set of composable primitives that express what AI labs call agents, multi-agent systems, skills, tool use, and memory. The composition makes those things safer, cheaper, and auditable — which is the property regulated organizations have been waiting for. We built it. We'd like Anthropic to be the lab that brings it to the field.

The primitives

Twelve composable pieces. Each one specific. Each one earning its place. Together they form Safebox.

Most AI deployment platforms have a small number of large abstractions — "agent," "tool," "skill" — each of which bundles many concerns and hides the trade-offs underneath. Safebox does the opposite. We expose a larger number of small primitives, each precise about what it does, and let composition produce the larger behaviors. The result is more verbose to describe but much easier to make claims about. When safety lives in the substrate, the substrate has to be specific.

Workflow

A pre-authored tree of Steps, signed at registration. The composition unit. Workflows define what gets done; they are the only legitimate way for actions to be sequenced.

Step

A node in a workflow. Each step invokes a Tool with typed input. Steps are how workflows compose tools without letting LLMs pick the next tool by free choice.

Workload

A running instance of a workflow. Workloads carry state across pause and resume; long-horizon tasks are workloads that wait for external events between steps.

Task

The unit of work within a workload, written to a task stream. Each task captures inputs, outputs, the LLM call that produced its decision, and the audit trail.

Tool

Sandboxed JavaScript that runs inside a step. Each tool is signed with a deterministic hash, declares its allowed action types, and has bounded access to the surrounding APIs.

Capability

A materialization of an external resource — an OpenAI account, a Stripe account, a Jira instance — as a stream a tool can read. Capabilities are how the outside world enters the workflow.

Protocol

The Safebox.protocol.* namespace of side-effect APIs (HTTP, SMTP, Email, Payment, Web3, Telegram, more). Protocols are how tools reach the outside world; each call is logged.

Safebot

A durable identity that posts in chats, runs workflows, and signs actions. Bots are streams; their handler entries declare which message types they respond to and which action types they may propose.

Policy

A per-action-type signing rule. M-of-N signers must approve before a write commits. Communities author their policies as data; sensitive actions require multi-party governance structurally.

Judgment

Code that runs before any action commits, checking the proposal against the proposing tool's declared bounds. Judgments are the contract layer. Wrong behavior is rejected, not just discouraged.

Convention

A community-published stream that influences tool behavior — brand voice, code style, vocabulary, tone. Conventions enter prompts as read-only context. They cannot expand any tool's capability.

Caching

Byte-stable rendering of stable streams (Safebots.assemble) into prompts whose KV cache is preserved across calls. Self-hosted inference makes the cache ours; the economics fall out.

Safebux

The utility token that pays for substrate compute. Replaces opaque API spend with a transparent unit accounted at the action level, redistributable across the federation.

Sealed Infrastructure

Safebox compute. No SSH. Deterministic AMI builds. vTPM attestation. M-of-N governance for any change. The hardware-and-OS layer where the models and their KV cache actually run.

That's the inventory. Each primitive is small enough to make claims about and specific enough to be testable. The composition that follows is what makes them powerful.

How they compose

Safebox layers the primitives so that each layer inherits the safety properties of the layers below it. From the bottom:

Layer 1 — Hardware & runtime

Sealed Infrastructure + self-hosted models with our own Caching. No SSH, deterministic builds, vTPM attestation. The KV cache stays local because the model stays local. Inference economics become controllable; data residency becomes structural.

↑ runs on

Layer 2 — Economics

Safebux pays for compute. Costs are transparent at the action level. Federation lets communities share compute cost-effectively without sharing data.

↑ pays for

Layer 3 — Inputs

Capabilities materialize external resources as streams. Protocols expose side effects (HTTP, SMTP, payments, Web3). Conventions shape tool behavior through read-only context.

↑ available to

Layer 4 — Execution

Tools, sandboxed and signed, run inside Steps. Tools declare which action types they may propose. The disjunction is structural: a tool that calls the LLM cannot also propose actions.

↑ composed by

Layer 5 — Composition

Workflows are pre-authored trees of steps. Workflows are signed; their composition is data, not LLM-improvised. Workloads are running instances; Tasks are the units of work they produce.

↑ initiated by

Layer 6 — Identity

Safebots are durable identities that post in chats and invoke workflows. Bots have signing keys, action contracts, and audit trails. They participate in conversations the way human members do — but bounded.

Cross-cutting safety primitives apply at every layer above. Judgments check every action proposal against the proposing tool's declared bounds before the action commits. Policies require M-of-N signers for governance-gated writes. Every action's reasoning chain — workflow + step + tool + LLM call + judgment verdict + signers — is captured into an audit trail that can be replayed weeks or years later.

The composition is how the safety story works. Conventions shape what the LLM produces but cannot expand a Tool's capabilities. A Tool can propose only actions in its declared list. Judgments reject proposals outside the contract. Policies require governance signers for sensitive actions. The audit trail captures everything. Each safety property is a structural consequence of how the primitives compose. Nothing depends on the model's good behavior.

Each primitive is small. Each is auditable in isolation. The safety lives in the composition.

II.

The mapping

Every concept the AI industry has reached for in the last three years expresses cleanly as a Safebox composition. The compositions are safer, cheaper, more auditable.

What follows is the mapping. Industry concept on the left; Safebox composition in the middle; the property the composition delivers on the right. Each row is a primitive Anthropic researchers will recognize, expressed in our vocabulary, with the property the composition gives you that the industry's version doesn't.

Industry concept	Safebox composition	What you gain
Agent (LLM in a tool-using loop)	A `Workflow` with `decide-class` tools that call `Runtime.llm` and emit structured intent, plus `act-class` tools that take that intent and call `Action.propose`. The LLM never picks the next action by free choice — the workflow's structure does.	The "agent went rogue" failure mode becomes structurally impossible. `Judgments` reject any action outside a tool's declared bounds. The PocketOS / DataTalks / Replit incidents are blocked at the deployment layer, not by training.
Multi-agent system	Multiple `Safebots` participating in the same chat, each with its own `Workflow` and `actionTypes` declaration. Composition lives in the chat or in a parent workflow that invokes child workflows.	Each agent is independently auditable. The "who told whom to do what" chain is in the audit trail, not opaque to the reviewer. Rogue-agent contagion is bounded by per-agent contracts.
Recursive orgs of agents	Workflow nesting. A `Step` in workflow A invokes a `Tool` that runs workflow B. The accountability cascade runs through the workflow signers; B's signers are accountable for B's behavior, A's for the composition.	Hierarchical agent organizations stay auditable. The "trust transitivity" problem dissolves because the workflow defining the hierarchy is itself signed and reviewable. No new safety machinery needed.
Skill (Anthropic's primitive)	`Convention` streams in the community's knowledge graph (the prose), plus `Tool` registration with declared bounds (the capability), plus handler entries declaring trigger conditions. The bundle becomes three primitives that compose.	Adding new conventions to the graph is safe by construction — they cannot expand a tool's capabilities. The audit boundary stays at the tool. Communities ingest unlimited prose without the security review per skill.
Tool use / function calling / MCP	`Tool` registration with sha256 hashing, declared `actionTypes`, sandboxed JS execution, `Judgment` checking each call. MCP can connect to a Safebox tool; Safebox's signing layer then makes the connection auditable.	Tools are first-class auditable citizens. Hash mismatches are caught at registration. Action proposals outside declared bounds are rejected. MCP standardizes the protocol; Safebox makes the connection trustworthy.
RAG (retrieval-augmented generation)	Grokers ingests documents into typed streams; `Tools` walk relations to find applicable streams; `Caching` renders them byte-stably so KV cache stays warm across calls.	Structured retrieval (relations, types, access control) instead of opaque vector similarity. Self-hosted inference plus cache locality drops marginal cost for repeated queries by orders of magnitude.
Long-horizon autonomous task	A `Workload` with sleep-and-resume `Steps`, durable `Task` streams capturing intermediate state, governance-gated checkpoints for high-stakes actions.	Pause-and-resume is a substrate primitive, not a per-application engineering effort. The audit trail across days or weeks of execution is queryable. Cost budgets at the workload level prevent runaway spend.
Agent memory / personalization	User-tier `Convention` and knowledge streams under the access cascade. `Tool` prompt assembly walks the user's relevant streams. Privacy is structural, not policy-based.	Memory is data the user owns and can read or edit directly. Cross-bot leakage is blocked by the access cascade. No "the platform remembers you" in the form of opaque server-side state.
Constitutional AI principles	`Judgment` code at the deployment layer, declarative `actionTypes` at the tool layer, `Policy` streams at the governance layer. Three places where bounded behavior is enforced rather than trained.	Behavioral preferences plus structural enforcement. Even a model that decides to violate the constitution cannot, because the action surface won't carry the violation through.
Persona / character / system prompt	`Convention` streams attached to a `Safebot`'s prompt assembly. The bot's identity is durable; its character is published; conventions can be inspected and updated as community-governed data.	Character is data, not magic. A bot's personality is a stream a community publishes, can audit, and can revise. The bot's signing identity is separate from its character; both are inspectable.
Agentic browsing / computer use	Browser actions become a typed `Tool` category — `act/browser-click`, `act/browser-type`, `act/browser-navigate` — each with a contract declaring permitted sites and action shapes.	The web's unstructured action surface becomes a typed contract Safebox can audit. The `Judgment` can refuse navigations to forbidden domains; `Policy` can require human approval for purchases.
Multi-modal generation	`Tools` that wrap modality-specific models (image, video, audio). Outputs are streams with typed schemas. `Capability` primitives connect external generation services if not run in-house.	Modality is data. Stream types extend cleanly. Self-hosted multi-modal stays inside Sealed Infrastructure, preserving the privacy and audit properties.

The pattern across all twelve rows is the same. The industry's conceptual primitives map onto compositions of Safebox primitives. The compositions are safer (each piece auditable independently, safety properties structural rather than behavioral), cheaper (self-hosted inference, KV cache locality, federated cost-sharing through Safebux), and more auditable (every action's full reasoning chain captured in the substrate).

Most of the AI industry's roadmap is about inventing new platform-specific abstractions for problems the substrate has already solved at the substrate layer. The labs are racing to build agents, multi-agent systems, skills, memory features, browser agents — each as a new primitive bound to their platform. Safebox doesn't add new primitives; it composes the existing twelve. The safety story doesn't have to be re-told for each new feature, because the safety lives at the level where the primitives meet, not at the level of any specific feature.

III.

Why safety scales — and attracts regulated organizations

The architectural property that makes Safebox safe is the same property that makes it attractive to organizations that can't deploy on third-party APIs. Safety and scale aren't in tension; they're the same primitive.

The conventional belief in AI deployment is that safety and scale are in tension — make the system safer and you make it more constrained, harder to use, slower to ship. The architectural pattern in Safebox inverts this. The property that makes the system safe is the same property that makes it deployable in environments where competitors can't go. Each is a consequence of the structural separation of capability from instruction.

Three concrete examples of how this works in regulated environments:

Healthcare and HIPAA. A healthcare organization cannot send patient data to a third-party AI API because the API provider becomes a Business Associate under HIPAA, with full audit and breach-notification obligations the API providers refuse to accept. Safebox runs on Sealed Infrastructure the organization owns or co-governs. Patient data never leaves the organization's perimeter. The model runs on hardware where the keys are held under M-of-N governance. The audit trail captures every action; HIPAA's "minimum necessary" requirement becomes a property the deployment enforces structurally.

Finance and SOC2 / PCI. Financial institutions can't deploy AI on regulated transaction data without an auditor signing off on the deployment surface. Safebox's Workflow + Tool + Judgment + Policy stack produces exactly the artifact auditors need: a signed registry of what the AI is permitted to do, machine-checked at every action, with a complete audit trail. SOC2 evidence comes out as a side effect of running the workflow. PCI scope reduction follows from the structural separation of payment-card-handling tools.

Government and FedRAMP / IL5. Government deployments require infrastructure that no commercial cloud's standard configuration meets. Sealed Infrastructure with deterministic builds, vTPM attestation, and no remote management is structurally compatible with high-security government environments in ways that the standard cloud agent stack is structurally incompatible with. The audit trail satisfies FedRAMP continuous monitoring requirements as data, not as additional logging tools bolted on top.

The pattern is the same across every regulated industry. The reason regulated organizations have been unable to deploy AI is that the existing deployment surface assumes a trust model the regulations don't permit. Safebox's primitives were specifically designed for organizations rather than for consumer apps; the regulatory fit is structural, not retrofitted. The property that makes Safebox safe at small scale — every action contract-checked, every reasoning chain auditable, every governance step multi-party — is the property auditors require at large scale.

Safety is the architecture that lets you scale. The auditors writing the requirements are describing what Safebox already is.

This is the regulated-deployment market that Glasswing put on the critical path and that the companion essay walks in detail. The trust layer that lets institutions deploy frontier models on sensitive data is a layer that does not exist in the public market. Safebox is what it looks like.

IV.

Where this could live

Three concrete shapes for how Safebox could fit alongside Anthropic without distorting either organization. You've been first to MCP, first to Skills, first to ship the honest agent warnings. Safebox is the next thing to be first to.

You pioneered MCP. Two years later it's an industry standard. You pioneered Skills. The pattern is being adopted across the labs. You shipped Claude Code with the warning screens that admit what the agent can do wrong, and the rest of the industry is slowly catching up to the honesty bar you set. Each of those was an architectural move other labs would have hedged on. You shipped them because you had the conviction that the field needs primitives like these and that someone had to be first to publish.

Safebox is the next move of that shape. Trustworthy deployment of AI in regulated environments is a problem that's being solved badly right now — by patches, by per-vendor stacks, by trust-us assurances that don't survive an auditor. Someone is going to publish the open standard for this, and the lab that publishes it will define the field for the next decade. That's a position worth being first to. You have the credibility to do it because you've already been first to MCP and first to Skills. Safebox is the third in that sequence.

Anthropic is a model lab. Training frontier models, evaluating them rigorously, publishing the work, maintaining the kind of discipline that produces the next generation of researchers — that's what Anthropic is uniquely good at. Becoming an infrastructure company would distort that. Building Safebox in-house, even if you had the bandwidth, would be a category error. It would make the same organization responsible for both the model and the deployment layer that bounds it, which is the conflict-of-interest pattern Safebox's federation primitives were specifically built to dissolve. Anthropic shouldn't be the company deciding which deployments are trustworthy. Auditors don't want that. Regulators don't want that. The market doesn't want that.

So the question is how Anthropic engages with this work without becoming it. Three shapes, in order of cost and entanglement:

1. Open-source patronage, like OpenClaw

Anthropic funds the open-source release of the Safebox stack. Same posture as Claude Code, the MCP server ecosystem, the OpenClaw response. Anthropic doesn't own it, doesn't operate it, doesn't certify it. Anthropic supports it publicly as the deployment layer the regulated market needs and as a reference architecture other vendors implement against.

The benefit to Anthropic: a non-controlled open standard for trustworthy deployment, which is what auditors actually want. They don't want Anthropic certifying its own deployments. They don't want a single-vendor stack. They want a published architecture with credible community governance. An open-source Safebox funded by Anthropic but governed independently is the answer.

The cost is small. Some grant funding. Some public co-signing. Some engineering review time when something breaks. Strategic distortion, zero. The team that built Safebox keeps building it. Other vendors implement it. The market gets a real standard. Anthropic gets to be the lab that funded the standard without having to maintain it.

2. Branded incubation

Anthropic incubates the work as a named program — call it whatever fits — with Anthropic providing strategic direction, customer access, and credibility, and the existing team doing the engineering. Safebox keeps its open architecture but gains immediate enterprise legitimacy from the Anthropic name.

The benefit is more direct. Safebox becomes a commercial offering Anthropic can include in enterprise sales, particularly for the regulated-industry deployments where Constitutional AI alone isn't enough to close. The risk is that an Anthropic-branded program starts to look like an Anthropic product, with the support and SLA expectations that creates. The mitigation is structural. Operate the program as a partnership with the existing team running under Anthropic auspices, with explicit separation between the model layer (Anthropic) and the deployment layer (the partnership). Some research labs incubate spin-out companies that retain academic affiliations. Same shape.

3. Acquisition with structural separation

Anthropic acquires the IP and the team, then operates the entity as a separately-governed subsidiary or related entity with its own brand, its own customers, its own decisions. Highest cost, most strategically interesting. Anthropic gains Safebox as a permanent strategic asset while operationally separating it from the model lab.

This shape works because Safebox is genuinely valuable as IP. The patents, the architectural work, the running implementation — all valuable independent of who operates it. Anthropic could acquire it specifically to keep it from ending up at a competitor lab, while keeping the operating entity at arm's length to preserve its credibility as a federation rather than a single-vendor stack. Some platform companies have acquired infrastructure projects and spun them off as foundation-governed open-source efforts. Strategic optionality without operational entanglement.

Which shape

None is obviously right. Each makes different bets about how the trust-layer market evolves and how entangled Anthropic wants to be with that market. The right answer depends on conversations that haven't happened yet, between people in Anthropic who would care about this work and the team that has built it.

What I want to put on the record: the work is real, the team is small and committed, and the architecture is at the stage where Anthropic's engagement could meaningfully accelerate it without distorting it. The window for that engagement is the window between now — when the deployment-trust market is forming — and the moment when one of the larger infrastructure vendors locks down a proprietary version that becomes the de facto standard. The latter would be worse for everyone, including Anthropic. A federation-governed open Safebox, supported but not controlled by Anthropic, lets your models reach the regulated markets without you having to become the trust layer yourselves.

A note to close on

You pioneered MCP. You pioneered Skills. Safebox is the next move.

We've been building Safebox for fifteen years. The Constitutional AI papers, the interpretability research, the warning screens that admit honestly what an agent can do wrong, the model cards that disclose what Claude can do badly — every one of those moves rhymes with what we've been doing on the deployment side. Different layer, same instinct. We didn't know AI would be the domain where Safebox's properties obviously mattered. We just kept building because the work was right.

It's right now too. The labs are converging on the same set of conclusions about deployment safety — that agents need bounded action surfaces, that knowledge needs to live somewhere auditable, that multi-party governance has to be structural rather than aspirational. None of the labs has Safebox to make those structural. We do. The only question is which lab will be the one to bring it to the field.

If you're a researcher or engineer at Anthropic and you've read this far, the request is simple. Look at the work. The technical essays at safebots.ai walk the architecture in different registers — agents.html for the safety thesis, grokers.html for the knowledge-graph approach, wisdom.html for the philosophy. The papers are at safebots.ai/papers. The plugin source is in developer-preview. None of it requires anyone to take anything on faith.

If after looking you think there's something here, the next step is a conversation. The team is small. The architecture is past proof-of-concept. The window for being early on this is open right now. You've been early on every important architectural move in the AI deployment story so far. Be early on this one too.

Gregory Magarshak · Safebots AI

For the careful reader at Anthropic · April 2026