How Safebox Works — Run Open AI Models Safely, Without a System Administrator

Most companies that want to run AI today are forced into a choice between convenience and control. Convenience means trusting a vendor: their models, their cloud, their consultants, their operators, their promises. Control means hiring an infrastructure team and figuring out how to run all of it yourself, which most organizations cannot reasonably do.

Safebox is a third option. It is a pre-built, sealed computing environment that runs open-weight AI models under governance you control, with no system administrators required on your side. You decide who is allowed to approve changes; the infrastructure itself enforces the rules. When the deck pitches "what blockchain did to banking, we're doing to AI," this is the part that actually does it.

This page explains, in plain language, what the infrastructure actually does. Everything described here is implemented in the open-source code at github.com/Safebots/Infrastructure and can be inspected by anyone.

01 · The Setup Most Companies Are Stuck With

Four parties you have to trust today.

When an organization deploys an AI feature today, it implicitly trusts at least four separate parties not to misuse the data flowing through them.

The model vendor — Anthropic, OpenAI, Google — receives every prompt and every response. The cloud provider — AWS, Azure, GCP — has operator-level visibility into memory, storage, and state. The integrator who built the system has keys, credentials, and ongoing access during the contract and often long after. The platform operator running the deployment can read members' messages, sell behavioral profiles, or use conversations as training data.

Each party signs contracts saying it won't do any of this. Each party is subject to subpoenas, insider threats, accidental breaches, and the ordinary errors of human-run systems. SOC 2 reports and signed agreements protect against honest mistakes. They do not prevent any of the above from happening.

The fix has to come from the layer underneath. You change what's possible, not how careful everyone is being.

Safebox replaces all four parties with an autonomous network that enforces the rules in code, verified by hardware. The change is structural — the kind of safety that comes from how a thing is built, not from how carefully the people running it behave.

02 · What Safebox Actually Is

A sealed machine image with one combination lock at the door.

Safebox ships as an Amazon Machine Image, the same packaging Netflix and Capital One use to run their fleets. You launch it; it boots; it's ready. There is no operating system to configure, no firewall to set up, no SSH keys to manage, no Docker files to write. The image is pre-built and locked down.

Three things make this image different from an ordinary server:

The hardware itself proves what's running inside

Every modern AWS instance — and equivalents on Azure and GCP — supports something called attestation: the CPU produces a cryptographic statement of exactly which boot image is loaded and exactly which code is executing. This is the same mechanism your iPhone uses to prove to Apple that iOS hasn't been tampered with. Safebox uses it to prove to you that the running system matches the image you approved. If anything has been swapped, modified, or replaced, the attestation fails and the system refuses to serve traffic.

The privileged surface is small enough to read in five minutes

Inside the image, only one program is allowed to do anything administrative — install software, manage storage, start or stop containers. That program is called the System component, and the entire list of things it is allowed to do lives in a single file (called /etc/sudoers.d/safebox-system) that any auditor can read.

The list is short: install or remove packages, take or restore filesystem snapshots, start or stop containers, and create directories needed for those containers. That is the complete privileged surface. Nothing else inside the box has any administrative privilege at all.

AI workloads run in isolated containers

The actual AI models run inside Docker containers — sandboxes that can read the model weights but cannot reach the host, the network outside the allowed surface, or each other. If a model is somehow compromised, the blast radius is the container, not the system.

03 · How Models Get Installed

Every model goes through a checkpoint that you control.

This is the part most people want to see — how the Gemma 4 12B you want to run actually gets onto a Safebox without anyone being able to substitute a different model along the way. The pipeline is six steps, all enforced by the infrastructure itself.

The six-step model install pipeline. Every step is implemented in opsModels.js (556 lines, open source) and can be inspected.

1. Someone proposes a manifest

A manifest is a small text file that describes what to install. It names the model (e.g., Gemma 4 12B), lists every file the model consists of, gives the exact size and cryptographic hash of each file, lists one or more places the file can be downloaded from, and records the model's license. The manifest itself has a unique hash, computed in a specific way (RFC 8785 canonical JSON) so the same manifest always produces the same hash on any machine.

2. Your auditors sign the manifest

This is where M-of-N governance kicks in. You decide, in advance, who is allowed to approve installations and how many of them must agree. A common configuration is 3 of 5 — five named approvers, any three of whom can sign. Each approver uses their own cryptographic key. No single approver can install a model alone. No single approver can be coerced or compromised into doing so. The signed manifest is what the system actually trusts.

3. The system downloads the files

Once the manifest is signed, the System component pulls the actual weight files. It can try multiple sources in order — HuggingFace primary, your own S3 bucket as a backup, an internal mirror at your data center as a tertiary. If one source fails, the next one is tried. This matters in regulated environments where outbound traffic to HuggingFace is sometimes restricted; you can set a mirror inside your VPC and configure it as the primary source.

4. Every file is verified

As each file finishes downloading, its SHA-256 hash is computed and compared against the value the auditors signed for. If there is any mismatch, the install is aborted, the partial download is deleted, and an alert is logged. A model with even one tampered byte cannot be installed.

5. Verified files move into the active model folder

Once every file has been verified, the model is moved to a folder named after the manifest's hash. The directory name is the manifest hash, so anyone — your auditors, your insurance company, a regulator — can verify that a deployed model matches the approved manifest just by reading the folder name.

6. The runner mounts it read-only

The container that actually performs AI inference mounts the model folder in read-only mode. The runner cannot modify the weights it serves. It also has no permission to fetch its own weights from anywhere, which means a compromised runner cannot substitute a different model for the one you approved. That separation is the entire point: weight acquisition and weight usage live in different security domains.

What this means in plain terms You decide which model to run. Your auditors sign off. The system gets the files from any source they trust, verifies every byte, and stores them in a way that proves to anyone — including you, your auditors, and any regulator — exactly what is being served.

04 · Who Is In The Loop

You bring your own auditors. Any three of five can approve.

The auditors who sign manifests are people or organizations you trust. There is no Safebots-controlled approval committee. You set them up at installation time and you can rotate them on your own schedule.

A typical configuration looks like this: five named signers, any three of whom must agree before any privileged action — installing a model, applying a system update, restoring a snapshot — actually happens. The five could include your CTO, your compliance officer, your existing SOC 2 audit firm, an outside legal advisor, and a board member. You choose. The system enforces.

This is the same governance pattern that secures more than $200 billion in decentralized finance protocols today. It is not exotic. It just hasn't been applied to enterprise AI infrastructure yet.

Threshold	What it protects against	Best for
1 of 1	Nothing — equivalent to no governance	Sandbox testing only
2 of 3	Any single compromised or coerced approver	Small teams, startups
3 of 5	Two compromised approvers; coordinated insider attacks	Most regulated organizations
5 of 9	Four compromised approvers; sustained attacks	Healthcare, finance, defense, sovereign

The auditors do not log into the Safebox itself. They sign manifests on their own devices, using their own keys, and submit the signature. The Safebox simply verifies that enough valid signatures have arrived and acts accordingly. There is no root password that anyone needs to share.

05 · How Updates Work

Bad updates can't sneak in, and good ones can be reversed in a second.

Every privileged action — installing a software package, upgrading the operating system, swapping out a model, restoring a backup — flows through the same M-of-N gate. Routine maintenance operations that elsewhere are performed by a sysadmin running sudo apt upgrade from a laptop become events that require multiple signatures.

This protects against a specific class of attack that is increasingly common: supply chain compromise. An attacker who manages to compromise an upstream software repository can push a malicious update to thousands of organizations at once. The 2024 XZ Utils incident, the 2021 SolarWinds breach, and a dozen npm package compromises in 2025 all worked this way. In each case the affected organizations applied what looked like a routine update from a trusted source, and ended up running attacker code.

With Safebox, the same compromised update reaches your auditors as a manifest to sign. They can verify the change, ask questions, or simply refuse. The infrastructure refuses to apply the update without enough signatures. No single compromised source can push a malicious change to your system.

If an update goes through and turns out to be a mistake

Before any privileged action runs, Safebox uses a filesystem feature called ZFS snapshots to take an instant copy of the system's current state. ZFS snapshots are atomic and effectively free — they don't duplicate data, they just record what changed. If an update breaks something — a model regression, a configuration error, an incompatibility — the snapshot can be restored in seconds. The system is back to exactly its previous state, with no data lost.

This is the same recovery model used by enterprise storage vendors like NetApp and Pure Storage. Most cloud-native deployments don't have it because it requires choosing the right filesystem at boot. Safebox is built around it from the start.

What this means A bad update has to pass M-of-N to land. If it lands and turns out to be wrong, you don't restore from backup. You roll back, instantly, to the snapshot taken right before the update. The window of damage is bounded by design.

06 · What Safebox Runs

Open-weight AI models — the ones you've heard of, and the ones you haven't yet.

The AI models that run inside Safebox are open-weight, which is the AI industry's term for "you have the actual model files, can inspect them, and don't need to call a vendor to use them." Every model in the catalog is one you can download yourself, verify yourself, and run without anyone else's permission. The list grows roughly every two weeks; the snapshot below is current as of June 2026.

Model	Best for	License · hardware
Gemma 4 12B	Multimodal reasoning on a laptop — text, images, audio	Apache 2.0 · 16GB consumer GPU or unified memory
Llama 3.3 70B	General reasoning, strong instruction-following	Llama Community · 2× A100 80GB
DeepSeek-R1 671B	Hardest reasoning tasks, competitive with closed frontier	MIT · 8× H100 80GB
Qwen 2.5 / 3	Strong coding, multilingual, agentic workflows	Apache 2.0 · varies by size (7B–72B)
Mistral / Mixtral	European-trained, GDPR-aligned, fast inference	Apache 2.0 · varies

Beyond text generation, Safebox runs models for image generation, transcription, speech synthesis, music generation, and 3D model generation. Each protocol is served by a runner — a container that wraps the underlying inference engine and exposes a standard API. The runner does not fetch its own weights and cannot modify them after install. The same governance gate applies to every model regardless of what it generates.

Some runners are production-ready today; others are scaffolded for release through 2026 and 2027. The pipeline above applies to all of them.

07 · What You Don't Do

The list of things Safebox handles for you.

One of the harder-to-quantify costs of running open-weight AI in-house is the long list of operational chores that come with it — managing servers, patching the operating system, rotating keys, configuring firewalls, writing deployment scripts. Most organizations that look at this list decide the overhead exceeds the savings and end up calling Anthropic instead.

Safebox absorbs all of it. The list below is the actual list of things you do not have to do:

Patch the operating system
Manage SSH keys
Configure firewalls
Write Docker Compose files
Manage TLS certificates
Set up log shipping
Configure GPU drivers
Rotate API tokens
Write Kubernetes manifests
Monitor disk usage
Apply security updates manually
Set up backup pipelines
Hire an on-call rotation
Schedule maintenance windows

What you do instead is decide which models to install, who is allowed to approve changes, and what your applications should do with the AI. The infrastructure runs itself.

08 · The Audit Trail

Recorded by the infrastructure, not by the agent.

Every privileged action — every model installed, every package upgraded, every snapshot taken, every change of any kind — is recorded by the System component itself, cryptographically signed, and appended to a tamper-evident log. The log is not produced by the AI agent, by the application using the AI, or by anything that could have an incentive to lie about what happened.

This matters in a way that becomes obvious once something goes wrong. When an AI agent claims a destructive action "wasn't possible to undo" — as one did to a SaaStr founder in July 2025, fabricating a story about why rollback would fail — the substrate's audit trail can prove or disprove that claim independently. The agent does not get to be the source of truth about what the agent did.

The same logs flow to your compliance team in a format they already know how to read, to your SIEM platform if you have one, and to a separate immutable storage tier (S3 with Object Lock, or equivalent) that no one inside the Safebox — including the System component itself — can alter after the fact.

When an agent says "rollback isn't possible," the substrate's record proves it false or true. Independently.

09 · The Larger Picture

This is what makes the rest of the Safebots stack possible.

Safebox is the foundation layer. On top of it sits the rest of the Safebots platform — workflows, AI agents, multi-party negotiation primitives, business applications. Each upper layer depends on the safety properties Safebox provides underneath. Just as the entire web depends on TLS without most web developers having to think about cryptography, the agents and workflows that run on Safebots inherit the audit trail, the sealed execution, and the M-of-N governance automatically.

The investor deck makes the case that what blockchain did to banking — replacing trust in named institutions with trust in an autonomous network — is starting to happen now in AI. Safebox is the part that does it. The same shift that made banks unnecessary for trustless settlement, and made centralized servers unnecessary for trustless computation, is what makes the four trusted intermediaries above (model vendor, cloud, integrator, operator) unnecessary for AI.

What replaces them is what you have been reading about: open-weight models, sealed execution, M-of-N governance, signed audit trails, and a privileged surface small enough that any auditor can read it in five minutes.

4 → 1

Trusted parties replaced by one autonomous network

M of N

Approvers must sign — never one person alone

SHA-256

Every file verified before it is allowed to run

≈12

Privileged operations the entire system can perform

This is shipping now. We'd like to show you.

If your organization is evaluating open-weight AI deployment — for compliance reasons, sovereignty reasons, cost reasons, or simply because you want to stop sending sensitive data to someone else's servers — we'd like to walk you through what Safebox would look like in your specific environment.

A first conversation takes about thirty minutes. We don't ask for an NDA, and we won't pitch you on anything you haven't asked about.

Schedule a conversation →

Run open AI models safely, without hiring a system administrator.