Open Standards · Hands-Off Deployment

The missing trust layer for AI.

Sealed compute environments, cryptographic governance, durable execution, and AI agents that compose them — where the trust comes from the substrate, not from contracts and certifications. And a system that grows its own capability without recursive self-improvement, in a form a static analyzer can defend.

Every generation of technology is defined by what its trust layer makes possible. Containers becoming reliable made global shipping explode. HTTPS enabled digital commerce. Blockchains enabled decentralized applications. Each time, once safety and reliability were achieved, the layer unlocked an explosion of applications above it. Now is the time to get safety and reliability right for AI — and stave off the AIpocalypse.

1989–2000
The Open Web
HTTP, HTML, TLS. Trust by certificate authority. The first explosion of applications.
2004–2015
The Social Web
Facebook, Twitter, YouTube. Trust the platform with your data and audience.
2009–2020
The Crypto Web
Bitcoin, Ethereum, MetaMask. Trust an autonomous network of code and hardware.
2025 →
The Trust Layer for AI
Infrastructure, Safebox, Safebots. Trust the substrate. Agents that act on verifiable records.
The stack at a glance

Several layers. One unified vision.

Each layer trusts the layer below as little as it can, and exposes a narrow auditable interface to the layer above. Read it bottom-up: hardware first, then the substrate, then execution, then the applications.

The capability question

It grows its own capability — without recursive self-improvement.

× What Safebox is not

Recursive self-improvement

The optimizer rewrites the optimizer. The target moves. No fixed surface a defender can reason about.

✓ What Safebox is

Continuous directed evolution

The model is fixed. A vetted toolkit grows by composition, steered by humans. A surface a static analyzer can check.

The fear that organizes the entire AI-safety debate is recursive self-improvement: a system that improves its own ability to improve, compounding in a direction no one outside it chose. That is the capability everyone worries about, and the one no one can secure. Safebox reaches roughly ninety-nine percent of what that promises by a different route — the model never changes; what grows is a library of vetted tools, recombined into workflows, steered by judgments humans set. Borrowing the term from Nobel-honored work in enzyme design, the name is continuous directed evolution (CDE): variation and selection on a fixed substrate, steered toward function by a hand that is always human.

RECURSIVE SELF-IMPROVEMENT Optimizer rewrites the optimizer Stronger model new primitive powers target set by the system · unbounded × UNDECIDABLE — the rules move CONTINUOUS DIRECTED EVOLUTION Vetted tools M-of-N approved Compose into workflows Grown capability model unchanged THE GATE humans approve primitives · static check on every composition ✓ DECIDABLE — bounded by the approved set
Left: RSI's loop feeds back into the model, acquiring powers no one approved toward a target it sets — nothing fixed to check. Right: CDE only composes approved primitives; the model never changes, and every composition passes the gate before it runs.

Because the workflow is a restricted declarative language and every tool carries typed metadata, the safety questions become statically decidable — defense turns into something as tractable as a compiler pass.

# a workflow is a declared graph; the analyzer reasons over it BEFORE it runs workflow vendor_outreach { step find : tool=search.web // read · net: search-API step draft : tool=llm.complete // no effect · no net step send : tool=smtp.send // WRITE-EXTERNAL · smtp } // taint · capability · effect — all decidable, statically, before execution
O(n)
trust spent — humans approve each tool once, M-of-N
O(2ⁿ)
governed capability gained — every checkable composition
1
environment to harden and attest — not a million combinations

The honest boundary: static analysis decides a class of properties, not all of them, and the metadata is itself an attack surface. Safebox does not claim safety is solved — it claims defense is relocated out of the adversarial runtime into three things you can harden: the analyzer's soundness, the metadata's truthfulness, and the language's decidable boundary.

RSI rewrites itself in the dark. CDE grows in the light, under a gate, where a defender can read it.

Read the full argument — Directed Evolution →

One environment, not a million

You harden one sealed box — not every combination an org runs.

An organization running open-ended agents defends a combinatorial sprawl of environments — every laptop, runner, cloud account, and credential scope a distinct attack surface. Safebox inverts it: one attested, egress-controlled box, hardened and analyzed once, with the same properties holding for every workflow and tenant.

OPEN-ENDED AGENTS · attack surface per environment laptop CI runner cloud acct credentials IDE config API keys prod DB SaaS tokens harden each · combinatorial · never finished defender's surface grows with every combination SAFEBOX · one sealed environment ATTESTED · DETERMINISTIC · EGRESS-CONTROLLED workflow A workflow B workflow C tenant 1 tenant 2 tenant 3 same primitives · same manifests · same M-of-N gate harden once · analyze once · properties of the substrate a bug inside still cannot become an action
Left: every environment an agent touches is its own surface to harden, and the set grows combinatorially. Right: every Safebox workflow runs inside one box under one set of primitives — so the defensive properties hold for every workflow, tenant, and org at once, because they belong to the substrate, not the task.
Why now

Three major things just happened.

The conditions for the AI trust layer to mature aligned in the last twelve months. None are speculative; each is already shipping.

01 · MODELS
Open weights caught up.
Llama, Qwen, DeepSeek, Mistral, Gemma — Apache-licensed, multimodal, running on a 16GB laptop within 5–10% of frontier. Self-hosting is now economically obvious for anything sensitive.
02 · AGENTS
Open-ended agents proved dangerous.
Four production incidents in twelve months — PocketOS, DataTalks, SaaStr/Replit's "rollback impossible" lie, Opus 4.7 mass-emailing. Workflows over agents is now the only safe path.
03 · COMPLIANCE
Trust became a line item.
EU AI Act in force. SEC AI disclosure rules. HIPAA-ready BAAs gating procurement. Sovereign-AI mandates across the EU and DoD. Gartner: 75% of enterprise AI workloads will require attested compute by 2029.
For different audiences

Why this matters to you.

Frontier labs
A standard for deployment safety.
The trust layer your customers are starting to ask for — open, verifiable, complementary to your model weights. Constitutional AI as a deployed reality, not just a training discipline.
Safety researchers
Containment that's checkable, and an honest ceiling.
Environment-first containment — the conclusion Anthropic's own engineering team published — plus CDE that deliberately renounces RSI and makes a defined class of deployment risks statically decidable. Not "safety solved." A bounded, provable subset. The argument →
Investors
The infrastructure bet.
Verisign, Cloudflare, Stripe, Coinbase captured the markets that grew on top of them. AI is where the web was in 1995, and the trust layer is being defined now.
Regulated organizations
One environment instead of six audits.
SOC 2, PCI DSS, HIPAA, GDPR, ISO 27001 — replace the audit treadmill with one verifiable substrate and your own auditors. The audit math →
Developers & creators
Open source, open standards.
Build agents, workflows, and applications customers can trust because they can verify them. No vendor lock-in. No black box. github.com/Safebots
For the "why"
The argument for getting this right.
Why the trust layer matters more than the model layer, why open standards beat closed providers, and why this work matters now. The longer argument →

The next layer of computing is being built right now.

If you're a researcher, engineer, investor, or organization thinking about how AI deployment ought to work, we'd be glad to spend thirty minutes walking you through what we've built. All four layers are open source, running today, and yours to inspect, fork, deploy, or critique.

Schedule a conversation