Sealed compute environments, cryptographic governance, durable execution, and AI agents that compose them — where the trust comes from the substrate, not from contracts and certifications. And a system that grows its own capability without recursive self-improvement, in a form a static analyzer can defend.
Every generation of technology is defined by what its trust layer makes possible. Containers becoming reliable made global shipping explode. HTTPS enabled digital commerce. Blockchains enabled decentralized applications. Each time, once safety and reliability were achieved, the layer unlocked an explosion of applications above it. Now is the time to get safety and reliability right for AI — and stave off the AIpocalypse.
Each layer trusts the layer below as little as it can, and exposes a narrow auditable interface to the layer above. Read it bottom-up: hardware first, then the substrate, then execution, then the applications.
The optimizer rewrites the optimizer. The target moves. No fixed surface a defender can reason about.
The model is fixed. A vetted toolkit grows by composition, steered by humans. A surface a static analyzer can check.
The fear that organizes the entire AI-safety debate is recursive self-improvement: a system that improves its own ability to improve, compounding in a direction no one outside it chose. That is the capability everyone worries about, and the one no one can secure. Safebox reaches roughly ninety-nine percent of what that promises by a different route — the model never changes; what grows is a library of vetted tools, recombined into workflows, steered by judgments humans set. Borrowing the term from Nobel-honored work in enzyme design, the name is continuous directed evolution (CDE): variation and selection on a fixed substrate, steered toward function by a hand that is always human.
Because the workflow is a restricted declarative language and every tool carries typed metadata, the safety questions become statically decidable — defense turns into something as tractable as a compiler pass.
The honest boundary: static analysis decides a class of properties, not all of them, and the metadata is itself an attack surface. Safebox does not claim safety is solved — it claims defense is relocated out of the adversarial runtime into three things you can harden: the analyzer's soundness, the metadata's truthfulness, and the language's decidable boundary.
RSI rewrites itself in the dark. CDE grows in the light, under a gate, where a defender can read it.
An organization running open-ended agents defends a combinatorial sprawl of environments — every laptop, runner, cloud account, and credential scope a distinct attack surface. Safebox inverts it: one attested, egress-controlled box, hardened and analyzed once, with the same properties holding for every workflow and tenant.
The conditions for the AI trust layer to mature aligned in the last twelve months. None are speculative; each is already shipping.
If you're a researcher, engineer, investor, or organization thinking about how AI deployment ought to work, we'd be glad to spend thirty minutes walking you through what we've built. All four layers are open source, running today, and yours to inspect, fork, deploy, or critique.
Schedule a conversation