Safebox + Safebots
One page · For Anthropic leadership · April 2026
A note to Anthropic leadership

The trust layer your own model just created the market for.

Glasswing put Claude in the hands of JPMorgan, the Linux Foundation, and the major hyperscalers, and within days Treasury and the Fed were convening the bank CEOs over cyber risk. Mythos has exposed just how vulnerable legacy systems are. What if Anthropic helped put together a new, open, standardized layer for handling institutional data, that also helped increase compliance with everything from SOC2 to PCI to GDPR to HIPAA? Safebox reimagines what trusted, general-purpose computing would need to look like in the age of AI. It helps finally attract institutional data in a way that allows running Anthropic's AI workloads on it.

The moment Anthropic made

Glasswing put the most capable model in the world inside eleven enterprises that employ armies of security, compliance, and audit staff. Those teams all ran the same evaluation process and reached versions of the same conclusion, which is that the model is impressive and they still can't sign a production contract until somebody shows them how the deployment would survive an auditor. There is presently no infrastructure layer in the market that answers the question well enough to close those contracts, and the OpenClaw episode — forty thousand exposed instances on Shodan leaking Anthropic API keys, Telegram tokens, and Slack credentials — has made CISOs considerably more, rather than less, cautious about what they're willing to wave through.

The mirror

OpenAI responded to the agentic moment by hiring Peter Steinberger, the Austrian developer who built OpenClaw in an hour on top of Claude Opus 4.5 and whose plain-text credential storage, unauthenticated WebSocket, and one-click RCE produced the CVE that now defines agentic risk in most enterprise threat models. Steinberger is going to run personal agents at OpenAI. Sam Altman called him a genius. The move is efficient in the way Thiel moves usually are, because it captures the interface layer and the developer mindshare fast, and it leaves the substrate problem for someone else to solve later or not at all.

OpenAI's move

The person who built the interface. Fast, viral, retrofitted security. The consumer-facing surface.

Anthropic's opening

The person who built the substrate. Attested execution, governed writes, 50+ proved theorems. The enterprise-facing trust layer.

The interface and the substrate are both needed for an agentic stack to function at enterprise scale, and the two pieces aren't substitutes for each other. OpenAI has taken the interface. The substrate is still available, and it's the half of the stack that enterprises actually pay for, because the interface is what makes agents usable and the substrate is what makes them deployable. Anthropic sitting under the whole market, with Claude inside every Safebox, is a position that appreciates with each enterprise contract and doesn't depreciate with the next model release.

Why the trust is actually trustworthy

Every current attempt at verifiable AI execution ends up making the same ask of the customer, which is to trust that some opaque image running somewhere in a TEE is actually the thing it claims to be. A byte-level attestation of an opaque image tells you that nothing has changed since the image was sealed, but it tells you nothing about what's inside the image in the first place. That's the gap Safebox closes, and closing it is what every other claim on this page ultimately rests on.

The patent-pending build pipeline produces two AMIs in sequence. The first is a fully reproducible development image, assembled from hash-verified source with SSH still available for inspection, reproducible bit-for-bit by any third party who wants to audit what's in it. The second is the production appliance: the same image with SSH removed, sealed, and cryptographically attested byte-for-byte. The customer can audit the first AMI, reproduce it independently, satisfy themselves that the software inside does what it claims and nothing else, and then verify that the production appliance is byte-identical to the image they just audited. The black-box is provably the same as the glass-box. That's the step every other verifiable-execution vendor skips or finesses, and it's why their attestation stories don't survive a serious compliance review.

Once the compute itself is auditable in this way, data protection becomes a special case of it. The Safebox isn't protecting data because it has a clever data-protection feature; it's protecting data because it's a compute environment that can be trusted to do exactly what the auditor verified and nothing else. Health records can be processed inside the box and have aggregate statistics produced without differential-privacy workarounds. Cryptographic keys can be generated and used inside the box without the blockchain-smart-contract trade-off where public verifiability forces public key material. The model can read sensitive inputs, reason over them, and return only the results the policy permits, with the raw data never crossing the boundary. It works the way a glovebox in a biolab works: the operator reaches in with attested instruments, the dangerous material stays inside, and the work gets done without contamination in either direction.

The enterprise gravity this creates is the part that compounds. Once the box is trusted, the organization's sensitive data starts flowing into it rather than away from it, because the box is the one place in the infrastructure where sensitive data can be worked on without creating new exfiltration risk. Model weights colocate next to the data in their own adjacent box, protected by the same mechanism. Other tenants' data flows in. Other models flow in. The Safebox becomes the trusted compute substrate that the whole enterprise AI deployment orbits around, and the switching cost of moving off it is the switching cost of rebuilding every audit, every attestation, every regulator-facing compliance story from scratch.

I
Two-AMI build

A reproducible development image any third party can rebuild bit-for-bit, sealed into a production appliance that's byte-identical to the audited source.

II
Nitro attestation

Hardware-rooted cryptographic proof, before any data is sent, that the running environment is the exact image the customer approved.

III
M-of-N signing

Governance keys held by auditors the customer organization chooses, with thresholds and roles configured to match the compliance posture they already run.

The three primitives compose into the full guarantee. Everything else described on this page — governed writes, deterministic replay, protected weights, regulated data flowing into the box instead of away from it — is a consequence of these three working together. They're the tip of the iceberg, and they're what the conversation should start with.

Want the architecture walk-through? Forty-five minutes, live demo, and the detail under these three primitives.
Book a conversation
The data never leaves the Safebox. The model is colocated, and its weights stay protected in a Safebox of their own. Customer data and Anthropic weights, each inside their own attested perimeter

What Safebox unlocks commercially

Regulated data that can't reach Claude today. Banks under FFIEC supervision, hospitals under HIPAA, European institutions under the AI Act, law firms under privilege, defense primes under ITAR. These organizations have the highest willingness to pay for AI-assisted work in the economy, and they also have legal obligations that prevent them from sending their data to a public model endpoint. The architecture Safebox enables is a pair of attested perimeters sitting side by side: customer data stays inside the customer's Safebox and never leaves, while Claude's weights run inside a separate adjacent Safebox that protects them from the customer's infrastructure the same way the customer's infrastructure is protected from Anthropic's. Both sides get cryptographic attestation of what the other side is running, neither side can exfiltrate the other, and the auditors on each side hold their own signing keys. A market segment Anthropic structurally cannot serve today becomes addressable on terms a CISO, general counsel, and board can sign, and the weights never touch infrastructure Anthropic doesn't have provable control over.

Better economics on the inference that actually gets served. The capex conversation across the industry has shifted in a specific direction over the past few quarters. Stargate's scale-back, Anthropic's own tier-pricing adjustments, and the broader recalibration of serving margins all point toward the same economic reality, which is that the sustainable business is revenue per high-value query rather than volume at thin margin. Safebots' graph layer routes low-value repetitive interactions into a deterministic wisdom library that handles them without invoking a model at all, so the queries that do reach Claude are the ones doing work a knowledge worker would otherwise bill for. The call volume goes down; the realized value per call goes up; the gross margin on the remaining calls is the kind of margin that justifies the infrastructure investment in the first place.

Why this is already real

The longer case, with architecture diagrams and the full competitive picture, is at safebots.ai/proposal.html. Briefly: the patent-pending Safebox AMI pipeline is production-grade and has been through nine rounds of external security audit with twenty-four findings remediated, the formal foundation is six academic papers carrying fifty-plus proved theorems across two research programs, the patent portfolio is seven filings deep, and the graph substrate those components run on top of is Qbix Streams, which has been in continuous production since 2011 across seven million users in more than a hundred countries. The thing being offered to Anthropic here is not an idea with a roadmap attached. It's a set of running systems that need the right institutional partner to reach the enterprise market, and the reason to move on the conversation now rather than in six months is that Phala, Fortanix, Wirken, and a handful of stealth efforts are converging on verifiable execution from the infrastructure-only side, without the graph or the governance model or the formal papers, and the vocabulary that enterprises will use to talk about "running Mythos safely" is going to be settled by whoever establishes the reference architecture first.

Gregory Magarshak · [email protected]
Confidential · April 2026