Recursive self-improvement is the capability everyone fears and no one can secure. Continuous directed evolution reaches almost the same ceiling, by a route that turns defense back into something as tractable as a compiler pass.
The optimizer rewrites the optimizer. The target moves. No fixed surface a defender can reason about.
The model is fixed. A vetted toolkit grows by composition, steered by humans. A surface a static analyzer can check.
An RSI system improves its own ability to improve — which is exactly why no one can secure it: a thing that rewrites the rules of its own improvement has no fixed surface for a defender to reason about. Safebox keeps the model fixed and grows a vetted toolkit by composition instead.
It acquires capabilities no one approved, toward goals no one set — the source of both its reach and its un-securability.
It renounces new primitive power. But the space of combinations of approved tools is already vast — a Cambrian diversification from a small vetted set of parts.
The hand on the wheel is always human. The system composes; it does not acquire.
Defending an AI system today means watching behavior, training classifiers, adding monitoring, and hoping. Safebox makes the system analyzable: the workflow is a restricted declarative language, and every tool carries typed metadata, so a static analyzer reasons about a composition before it runs.
This is the move that made type systems and capability security work: constrain the language so the safety properties you care about become provable. A type checker proves a class of crashes cannot happen without running your program; a Safebox analyzer proves a tainted read cannot reach an external write without running the workflow.
Static analysis decides a class of properties, not all of them — the composition of two safe primitives is not always safe, and the metadata is itself an attack surface a lying manifest can defeat.
So Safebox does not claim defense is solved. It claims defense is relocated — out of the adversarial runtime into three things you can harden: the analyzer's soundness, the metadata's truthfulness, and the language's decidable boundary.
Steel Skeleton vs. Sandcastles names three ways to build intelligence. The sandcastle (prompts and vibes) collapses when a model updates. The swarm (emergent, self-modifying) is un-debuggable and unprovable, because emergence is not architecture. Only the steel skeleton survives.
That is the warning CDE answers about itself: a combinatorial system without a skeleton would become the swarm. The skeleton — typed primitives, policy gates outside the prompts, replayable execution, static enforcement — is what keeps it a building. The agents are cognition; the framework is architecture.
An organization running open-ended agents defends a combinatorial sprawl of environments — every laptop, runner, cloud account, and credential scope a distinct attack surface. Safebox inverts it: one attested, egress-controlled box, hardened and analyzed once.
A vulnerability found inside the box is not a side effect: even a perfect exploit chain cannot reach an external write without a matching signed manifest and an M-of-N approval. Patching faster is a losing race against industrial-scale offense; sealing the environment once and proving the boundary scales the other way.
The usual fear is that capability and danger rise together. The whole point of CDE is to break that coupling: capability rises with the combinatorial closure of approved tools; danger does not, because the new capability is composed from vetted parts inside a sealed box under a static check.
If Safebots proliferate and outcompete open-ended agents — not by being more clever, but by being the version an organization can deploy without betting the company on a model's restraint — then AI capability keeps climbing while the defensive burden falls. Every org defends the same kind of sealed environment with the same kind of static analysis, instead of improvising its own containment and re-learning the same lessons through its own breach.
CDE will not do the last one percent — it will never acquire a genuinely new primitive power on its own, and that renunciation is what makes it safe. For the ninety-nine percent that is real work, it reaches the same ceiling as the dangerous machine, by a route that leaves a steel skeleton behind: a fixed model, a vetted toolkit, a declarative language, a single sealed environment, and a static analyzer that proves what the box will and will not do before it does anything at all.
RSI rewrites itself in the dark. CDE grows in the light. The fearsome version offers a world where capability outruns anyone's ability to defend it. This one offers a world where capability climbs and defense gets simpler at the same time — because the power lives in composition, and composition is checkable.