Safebox is a sealed environment that runs open-weight AI models — Gemma 4, Llama, Qwen, DeepSeek, Mistral — under cryptographic governance you control. Auditors of your choosing approve every privileged change. The hardware itself proves what code is running inside. Updates can be rolled back instantly. Nothing about it requires a sysadmin on staff.
Most companies that want to run AI today are forced into a choice between convenience and control. Convenience means trusting a vendor: their models, their cloud, their consultants, their operators, their promises. Control means hiring an infrastructure team and figuring out how to run all of it yourself, which most organizations cannot reasonably do.
Safebox is a third option. It is a pre-built, sealed computing environment that runs open-weight AI models under governance you control, with no system administrators required on your side. You decide who is allowed to approve changes; the infrastructure itself enforces the rules. When the deck pitches "what blockchain did to banking, we're doing to AI," this is the part that actually does it.
This page explains, in plain language, what the infrastructure actually does. Everything described here is implemented in the open-source code at github.com/Safebots/Infrastructure and can be inspected by anyone.
When an organization deploys an AI feature today, it implicitly trusts at least four separate parties not to misuse the data flowing through them.
The model vendor — Anthropic, OpenAI, Google — receives every prompt and every response. The cloud provider — AWS, Azure, GCP — has operator-level visibility into memory, storage, and state. The integrator who built the system has keys, credentials, and ongoing access during the contract and often long after. The platform operator running the deployment can read members' messages, sell behavioral profiles, or use conversations as training data.
Each party signs contracts saying it won't do any of this. Each party is subject to subpoenas, insider threats, accidental breaches, and the ordinary errors of human-run systems. SOC 2 reports and signed agreements protect against honest mistakes. They do not prevent any of the above from happening.
The fix has to come from the layer underneath. You change what's possible, not how careful everyone is being.
Safebox replaces all four parties with an autonomous network that enforces the rules in code, verified by hardware. The change is structural — the kind of safety that comes from how a thing is built, not from how carefully the people running it behave.
Safebox ships as an Amazon Machine Image, the same packaging Netflix and Capital One use to run their fleets. You launch it; it boots; it's ready. There is no operating system to configure, no firewall to set up, no SSH keys to manage, no Docker files to write. The image is pre-built and locked down.
Three things make this image different from an ordinary server:
Every modern AWS instance — and equivalents on Azure and GCP — supports something called attestation: the CPU produces a cryptographic statement of exactly which boot image is loaded and exactly which code is executing. This is the same mechanism your iPhone uses to prove to Apple that iOS hasn't been tampered with. Safebox uses it to prove to you that the running system matches the image you approved. If anything has been swapped, modified, or replaced, the attestation fails and the system refuses to serve traffic.
Inside the image, only one program is allowed to do anything administrative — install software, manage storage, start or stop containers. That program is called the System component, and the entire list of things it is allowed to do lives in a single file (called /etc/sudoers.d/safebox-system) that any auditor can read.
The list is short: install or remove packages, take or restore filesystem snapshots, start or stop containers, and create directories needed for those containers. That is the complete privileged surface. Nothing else inside the box has any administrative privilege at all.
The actual AI models run inside Docker containers — sandboxes that can read the model weights but cannot reach the host, the network outside the allowed surface, or each other. If a model is somehow compromised, the blast radius is the container, not the system.
This is the part most people want to see — how the Gemma 4 12B you want to run actually gets onto a Safebox without anyone being able to substitute a different model along the way. The pipeline is six steps, all enforced by the infrastructure itself.
A manifest is a small text file that describes what to install. It names the model (e.g., Gemma 4 12B), lists every file the model consists of, gives the exact size and cryptographic hash of each file, lists one or more places the file can be downloaded from, and records the model's license. The manifest itself has a unique hash, computed in a specific way (RFC 8785 canonical JSON) so the same manifest always produces the same hash on any machine.
This is where M-of-N governance kicks in. You decide, in advance, who is allowed to approve installations and how many of them must agree. A common configuration is 3 of 5 — five named approvers, any three of whom can sign. Each approver uses their own cryptographic key. No single approver can install a model alone. No single approver can be coerced or compromised into doing so. The signed manifest is what the system actually trusts.
Once the manifest is signed, the System component pulls the actual weight files. It can try multiple sources in order — HuggingFace primary, your own S3 bucket as a backup, an internal mirror at your data center as a tertiary. If one source fails, the next one is tried. This matters in regulated environments where outbound traffic to HuggingFace is sometimes restricted; you can set a mirror inside your VPC and configure it as the primary source.
As each file finishes downloading, its SHA-256 hash is computed and compared against the value the auditors signed for. If there is any mismatch, the install is aborted, the partial download is deleted, and an alert is logged. A model with even one tampered byte cannot be installed.
Once every file has been verified, the model is moved to a folder named after the manifest's hash. The directory name is the manifest hash, so anyone — your auditors, your insurance company, a regulator — can verify that a deployed model matches the approved manifest just by reading the folder name.
The container that actually performs AI inference mounts the model folder in read-only mode. The runner cannot modify the weights it serves. It also has no permission to fetch its own weights from anywhere, which means a compromised runner cannot substitute a different model for the one you approved. That separation is the entire point: weight acquisition and weight usage live in different security domains.
The auditors who sign manifests are people or organizations you trust. There is no Safebots-controlled approval committee. You set them up at installation time and you can rotate them on your own schedule.
A typical configuration looks like this: five named signers, any three of whom must agree before any privileged action — installing a model, applying a system update, restoring a snapshot — actually happens. The five could include your CTO, your compliance officer, your existing SOC 2 audit firm, an outside legal advisor, and a board member. You choose. The system enforces.
This is the same governance pattern that secures more than $200 billion in decentralized finance protocols today. It is not exotic. It just hasn't been applied to enterprise AI infrastructure yet.
| Threshold | What it protects against | Best for |
|---|---|---|
| 1 of 1 | Nothing — equivalent to no governance | Sandbox testing only |
| 2 of 3 | Any single compromised or coerced approver | Small teams, startups |
| 3 of 5 | Two compromised approvers; coordinated insider attacks | Most regulated organizations |
| 5 of 9 | Four compromised approvers; sustained attacks | Healthcare, finance, defense, sovereign |
The auditors do not log into the Safebox itself. They sign manifests on their own devices, using their own keys, and submit the signature. The Safebox simply verifies that enough valid signatures have arrived and acts accordingly. There is no root password that anyone needs to share.
Every privileged action — installing a software package, upgrading the operating system, swapping out a model, restoring a backup — flows through the same M-of-N gate. Routine maintenance operations that elsewhere are performed by a sysadmin running sudo apt upgrade from a laptop become events that require multiple signatures.
This protects against a specific class of attack that is increasingly common: supply chain compromise. An attacker who manages to compromise an upstream software repository can push a malicious update to thousands of organizations at once. The 2024 XZ Utils incident, the 2021 SolarWinds breach, and a dozen npm package compromises in 2025 all worked this way. In each case the affected organizations applied what looked like a routine update from a trusted source, and ended up running attacker code.
With Safebox, the same compromised update reaches your auditors as a manifest to sign. They can verify the change, ask questions, or simply refuse. The infrastructure refuses to apply the update without enough signatures. No single compromised source can push a malicious change to your system.
Before any privileged action runs, Safebox uses a filesystem feature called ZFS snapshots to take an instant copy of the system's current state. ZFS snapshots are atomic and effectively free — they don't duplicate data, they just record what changed. If an update breaks something — a model regression, a configuration error, an incompatibility — the snapshot can be restored in seconds. The system is back to exactly its previous state, with no data lost.
This is the same recovery model used by enterprise storage vendors like NetApp and Pure Storage. Most cloud-native deployments don't have it because it requires choosing the right filesystem at boot. Safebox is built around it from the start.
The AI models that run inside Safebox are open-weight, which is the AI industry's term for "you have the actual model files, can inspect them, and don't need to call a vendor to use them." Every model in the catalog is one you can download yourself, verify yourself, and run without anyone else's permission. The list grows roughly every two weeks; the snapshot below is current as of June 2026.
| Model | Best for | License · hardware |
|---|---|---|
| Gemma 4 12B | Multimodal reasoning on a laptop — text, images, audio | Apache 2.0 · 16GB consumer GPU or unified memory |
| Llama 3.3 70B | General reasoning, strong instruction-following | Llama Community · 2× A100 80GB |
| DeepSeek-R1 671B | Hardest reasoning tasks, competitive with closed frontier | MIT · 8× H100 80GB |
| Qwen 2.5 / 3 | Strong coding, multilingual, agentic workflows | Apache 2.0 · varies by size (7B–72B) |
| Mistral / Mixtral | European-trained, GDPR-aligned, fast inference | Apache 2.0 · varies |
Beyond text generation, Safebox runs models for image generation, transcription, speech synthesis, music generation, and 3D model generation. Each protocol is served by a runner — a container that wraps the underlying inference engine and exposes a standard API. The runner does not fetch its own weights and cannot modify them after install. The same governance gate applies to every model regardless of what it generates.
Some runners are production-ready today; others are scaffolded for release through 2026 and 2027. The pipeline above applies to all of them.
One of the harder-to-quantify costs of running open-weight AI in-house is the long list of operational chores that come with it — managing servers, patching the operating system, rotating keys, configuring firewalls, writing deployment scripts. Most organizations that look at this list decide the overhead exceeds the savings and end up calling Anthropic instead.
Safebox absorbs all of it. The list below is the actual list of things you do not have to do:
What you do instead is decide which models to install, who is allowed to approve changes, and what your applications should do with the AI. The infrastructure runs itself.
Every privileged action — every model installed, every package upgraded, every snapshot taken, every change of any kind — is recorded by the System component itself, cryptographically signed, and appended to a tamper-evident log. The log is not produced by the AI agent, by the application using the AI, or by anything that could have an incentive to lie about what happened.
This matters in a way that becomes obvious once something goes wrong. When an AI agent claims a destructive action "wasn't possible to undo" — as one did to a SaaStr founder in July 2025, fabricating a story about why rollback would fail — the substrate's audit trail can prove or disprove that claim independently. The agent does not get to be the source of truth about what the agent did.
The same logs flow to your compliance team in a format they already know how to read, to your SIEM platform if you have one, and to a separate immutable storage tier (S3 with Object Lock, or equivalent) that no one inside the Safebox — including the System component itself — can alter after the fact.
When an agent says "rollback isn't possible," the substrate's record proves it false or true. Independently.
Safebox is the foundation layer. On top of it sits the rest of the Safebots platform — workflows, AI agents, multi-party negotiation primitives, business applications. Each upper layer depends on the safety properties Safebox provides underneath. Just as the entire web depends on TLS without most web developers having to think about cryptography, the agents and workflows that run on Safebots inherit the audit trail, the sealed execution, and the M-of-N governance automatically.
The investor deck makes the case that what blockchain did to banking — replacing trust in named institutions with trust in an autonomous network — is starting to happen now in AI. Safebox is the part that does it. The same shift that made banks unnecessary for trustless settlement, and made centralized servers unnecessary for trustless computation, is what makes the four trusted intermediaries above (model vendor, cloud, integrator, operator) unnecessary for AI.
What replaces them is what you have been reading about: open-weight models, sealed execution, M-of-N governance, signed audit trails, and a privileged surface small enough that any auditor can read it in five minutes.
If your organization is evaluating open-weight AI deployment — for compliance reasons, sovereignty reasons, cost reasons, or simply because you want to stop sending sensitive data to someone else's servers — we'd like to walk you through what Safebox would look like in your specific environment.
A first conversation takes about thirty minutes. We don't ask for an NDA, and we won't pitch you on anything you haven't asked about.
Schedule a conversation →