Open Standards · Hands-Off Deployment

The missing trust layer for AI.

Sealed compute environments, cryptographic governance, durable execution, and AI agents that compose them — where the trust comes from the substrate, not from contracts and certifications. And a system that grows its own capability without recursive self-improvement, in a form a static analyzer can defend.

See the stack github.com/Safebots

Every generation of technology is defined by what its trust layer makes possible. Containers becoming reliable made global shipping explode. HTTPS enabled digital commerce. Blockchains enabled decentralized applications. Each time, once safety and reliability were achieved, the layer unlocked an explosion of applications above it. Now is the time to get safety and reliability right for AI — and stave off the AIpocalypse.

1989–2000

The Open Web

HTTP, HTML, TLS. Trust by certificate authority. The first explosion of applications.

2004–2015

The Social Web

Facebook, Twitter, YouTube. Trust the platform with your data and audience.

2009–2020

The Crypto Web

Bitcoin, Ethereum, MetaMask. Trust an autonomous network of code and hardware.

2025 →

The Trust Layer for AI

Infrastructure, Safebox, Safebots. Trust the substrate. Agents that act on verifiable records.

The stack at a glance

Several layers. One unified vision.

Each layer trusts the layer below as little as it can, and exposes a narrow auditable interface to the layer above. Read it bottom-up: hardware first, then the substrate, then execution, then the applications.

Foundation · Hardware

Infrastructure

The sealed compute environment. Hardware-attested AMI, M-of-N privileged operations, ZFS rollback, and a privileged surface small enough to audit in five minutes. The Bitcoin layer of the stack.

Execution · Governance

Safebox

The durable trust execution layer. Workflows, sandboxed tools, M-of-N action governance, OpenClaim signatures, a cryptographic audit trail verifiable from a browser. The Ethereum layer.

Applications · Collaboration

Safebots

AI agents that compose tools, hold conversations, and act on behalf of users — all inside the Safebox runtime. The MetaMask-and-beyond layer where consumer applications live.

Data Ingestion · Code Analysis

Grokers

Deterministic code-analysis agents that operate on whole repositories. Tree-sitter parsers for 10+ languages, swarm scheduling, replayable execution. A Safebox plugin.

Code Generation · Management

Code

Workflow-driven code generation with deterministic streaming. Uses Safebox's action governance for every file write. No surprises, no rogue commits.

The capability question

It grows its own capability — without recursive self-improvement.

× What Safebox is not

Recursive self-improvement

The optimizer rewrites the optimizer. The target moves. No fixed surface a defender can reason about.

✓ What Safebox is

Continuous directed evolution

The model is fixed. A vetted toolkit grows by composition, steered by humans. A surface a static analyzer can check.

The fear that organizes the entire AI-safety debate is recursive self-improvement: a system that improves its own ability to improve, compounding in a direction no one outside it chose. That is the capability everyone worries about, and the one no one can secure. Safebox reaches roughly ninety-nine percent of what that promises by a different route — the model never changes; what grows is a library of vetted tools, recombined into workflows, steered by judgments humans set. Borrowing the term from Nobel-honored work in enzyme design, the name is continuous directed evolution (CDE): variation and selection on a fixed substrate, steered toward function by a hand that is always human.

Left: RSI's loop feeds back into the model, acquiring powers no one approved toward a target it sets — nothing fixed to check. Right: CDE only composes approved primitives; the model never changes, and every composition passes the gate before it runs.

Because the workflow is a restricted declarative language and every tool carries typed metadata, the safety questions become statically decidable — defense turns into something as tractable as a compiler pass.

# a workflow is a declared graph; the analyzer reasons over it BEFORE it runs
workflow vendor_outreach {
  step find  : tool=search.web     // read · net: search-API
  step draft : tool=llm.complete   // no effect · no net
  step send  : tool=smtp.send     // WRITE-EXTERNAL · smtp
}
// taint · capability · effect — all decidable, statically, before execution

O(n)

trust spent — humans approve each tool once, M-of-N

O(2ⁿ)

governed capability gained — every checkable composition

environment to harden and attest — not a million combinations

The honest boundary: static analysis decides a class of properties, not all of them, and the metadata is itself an attack surface. Safebox does not claim safety is solved — it claims defense is relocated out of the adversarial runtime into three things you can harden: the analyzer's soundness, the metadata's truthfulness, and the language's decidable boundary.

RSI rewrites itself in the dark. CDE grows in the light, under a gate, where a defender can read it.

Read the full argument — Directed Evolution →

One environment, not a million

You harden one sealed box — not every combination an org runs.

An organization running open-ended agents defends a combinatorial sprawl of environments — every laptop, runner, cloud account, and credential scope a distinct attack surface. Safebox inverts it: one attested, egress-controlled box, hardened and analyzed once, with the same properties holding for every workflow and tenant.

Left: every environment an agent touches is its own surface to harden, and the set grows combinatorially. Right: every Safebox workflow runs inside one box under one set of primitives — so the defensive properties hold for every workflow, tenant, and org at once, because they belong to the substrate, not the task.

Why now

Three major things just happened.

The conditions for the AI trust layer to mature aligned in the last twelve months. None are speculative; each is already shipping.

01 · MODELS

Open weights caught up.

Llama, Qwen, DeepSeek, Mistral, Gemma — Apache-licensed, multimodal, running on a 16GB laptop within 5–10% of frontier. Self-hosting is now economically obvious for anything sensitive.

02 · AGENTS

Open-ended agents proved dangerous.

Four production incidents in twelve months — PocketOS, DataTalks, SaaStr/Replit's "rollback impossible" lie, Opus 4.7 mass-emailing. Workflows over agents is now the only safe path.

03 · COMPLIANCE

Trust became a line item.

EU AI Act in force. SEC AI disclosure rules. HIPAA-ready BAAs gating procurement. Sovereign-AI mandates across the EU and DoD. Gartner: 75% of enterprise AI workloads will require attested compute by 2029.

For different audiences

Why this matters to you.

Frontier labs

A standard for deployment safety.

The trust layer your customers are starting to ask for — open, verifiable, complementary to your model weights. Constitutional AI as a deployed reality, not just a training discipline.

Safety researchers

Containment that's checkable, and an honest ceiling.

Environment-first containment — the conclusion Anthropic's own engineering team published — plus CDE that deliberately renounces RSI and makes a defined class of deployment risks statically decidable. Not "safety solved." A bounded, provable subset. The argument →

Investors

The infrastructure bet.

Verisign, Cloudflare, Stripe, Coinbase captured the markets that grew on top of them. AI is where the web was in 1995, and the trust layer is being defined now.

Regulated organizations

One environment instead of six audits.

SOC 2, PCI DSS, HIPAA, GDPR, ISO 27001 — replace the audit treadmill with one verifiable substrate and your own auditors. The audit math →

Developers & creators

Open source, open standards.

Build agents, workflows, and applications customers can trust because they can verify them. No vendor lock-in. No black box. github.com/Safebots

For the "why"

The argument for getting this right.

Why the trust layer matters more than the model layer, why open standards beat closed providers, and why this work matters now. The longer argument →

The next layer of computing is being built right now.

If you're a researcher, engineer, investor, or organization thinking about how AI deployment ought to work, we'd be glad to spend thirty minutes walking you through what we've built. All four layers are open source, running today, and yours to inspect, fork, deploy, or critique.

Schedule a conversation