The durable trust layer · how it works

The Safebox is the part that makes the rest safe.

Safebox is an open-source execution environment for AI workflows. Every state change passes through cryptographic governance you control. Anyone with a web browser can verify they're talking to a real Safebox. One standardized environment replaces the patchwork of SOC 2, PCI, and GDPR audits most companies cobble together.

Built on the Safebox Infrastructure · Source at github.com/Safebots · Reading time ~14 minutes

If Infrastructure is the sealed room with one combination lock at the door, Safebox is the rule-book the people inside the room have to follow. The rules are written in code. Every action taken inside the room is recorded and signed. The rules can only be changed with the same M-of-N approval that opens the door — and crucially, anyone watching from outside can verify both that the rules haven't changed and that everything happening inside is following them.

This page explains, in plain language, what Safebox actually does, what it replaces for businesses today, and why standardizing on one verifiable environment costs less than maintaining the audit-and-certification treadmill most organizations are currently on. Everything described here is implemented in the open-source code and can be inspected by anyone.

The big idea in one sentence — Safebox replaces the question "do I trust the company running this AI?" with the question "do I trust the cryptographic record of what this AI did?" The first question requires audits, certifications, and faith in human institutions. The second can be answered from a web browser.

01 · The Audit Treadmill

What businesses actually pay for today.

Before Safebox, an organization that wants to deploy AI responsibly ends up paying for trust in pieces. Each piece is its own line item, its own audit, its own renewal cycle, its own team to maintain. The total cost is large and grows linearly with the number of environments, vendors, and jurisdictions the business operates in.

A mid-sized company running AI services for healthcare, finance, or anything regulated typically pays for some combination of:

Audit / certification	Annual cost	What it actually attests
SOC 2 Type II	$20K–$100K + $50K–$500K prep	A third party reviewed your controls during a sample window. Says nothing about what your code did after the audit closed.
PCI DSS	$50K–$200K	Your card-data systems pass a checklist. Says nothing about the rest of your environment.
HIPAA / BAA	$20K–$100K + legal	A signed contract says your vendor will follow rules. Says nothing about whether they did.
GDPR / SCC review	$30K–$150K legal	A lawyer attests your data flows comply with EU rules — at the moment of the review.
ISO 27001	$30K–$120K	Your management system matches a standard. Again, point-in-time.
Penetration tests	$20K–$80K per engagement	Someone tried to break in for two weeks. Hopefully they found things. Maybe they didn't.

A regulated mid-market company easily spends $300K to $1.5M per year on this stack, plus internal engineering time that runs 5–15% of total engineering payroll. Bigger companies spend orders of magnitude more. And every one of these certifications expires, has to be renewed, and only attests to a moment in time — what your systems looked like during the audit window, not what they did yesterday or what they're doing right now.

None of this is bad. Auditors do useful work. The problem is the architecture. Trust is being purchased from many vendors, attested by many different parties, in many different formats, with many different scopes, on overlapping schedules. The same engineering team has to satisfy all of them. Most of the cost is reconciliation — making sure the controls that satisfied auditor A also satisfy auditor B, then putting that together into something the board and the regulators can both read.

The audit industry is enormous because every customer pays for its own version of the same answer.

02 · The Standard Environment

One environment. Open source. Bring your own auditors.

Safebox replaces this patchwork with a single environment that's the same wherever it runs. Same cryptographic guarantees in your data center, in AWS, in Azure, on-prem, or air-gapped. Same audit trail format. Same governance model. Same code, same hashes, same signatures.

You don't pay an auditor to certify Safebox itself — the design is verifiable from the source. You pay your auditors to verify that your deployment is configured correctly for your requirements. That's a much smaller, much cheaper question.

How this changes the audit math

The audit cost shifts from "do these controls exist and are they working" (which requires sampling activity and trusting that what's sampled is representative) to "is the cryptographic record consistent with the policy" (which is a deterministic check anyone can run). The second question is verifiable from the audit logs themselves. An auditor can answer it in hours rather than weeks.

You still need auditors. You still need certifications for regulators that require them. The difference is that the auditors are looking at verifiable records instead of process documentation. The cost falls dramatically because the work being done is fundamentally easier.

The other change: your auditors are your auditors. They sign with their own keys, on their own devices. There is no Safebots-controlled audit committee. There is no third party that has to bless your configuration before you can deploy. You bring whoever you trust — your existing SOC 2 firm, your in-house compliance team, your board's audit committee, an outside legal advisor — and they each get a key. M of N of them must agree before anything privileged happens.

03 · Verifiable From A Browser

Cryptographic attestation, all the way to the user.

This is the part that most people don't realize is possible until they see it work. A Safebox publishes a verifiable claim about what it is and what it just did. The claim is signed with a key whose ancestry traces back to the hardware attestation produced by the underlying Infrastructure. A web browser — without any plugin, any extension, any special setup — can verify that signature chain and confirm:

1. The Safebox is running the code its operators claim it is running.
2. The code hasn't been modified since the last M-of-N-signed update.
3. The specific action being attested actually happened, was approved by the right policy, and was executed by the running code.
4. The attestation was generated by hardware that the user (or their auditor) can independently verify exists in a known cloud or data-center environment.

The verification uses OpenClaim signatures — an open standard for ES256 (P-256 with SHA-256) JSON claims. Browsers have implemented this since 2017. The standard is widely deployed for FIDO2 hardware tokens, signed JWTs, and decentralized identity systems. There is no proprietary cryptography in this path. There is no need to install anything. There is no certificate authority to trust.

OpenClaim — A signed JSON envelope using P-256 elliptic curve cryptography and SHA-256 hashing, with deterministic canonicalization (RFC 8785) so the same logical claim always produces the same bytes to sign. Compatible across Node.js, PHP, browsers, and mobile platforms by design.

What this enables: a customer interacting with a Safebox-hosted application can be shown, in their own browser, a verifiable receipt that proves the AI processed their data inside a specific sealed environment, that the data was never exfiltrated, that the result was produced by a specific approved version of the code, and that the entire interaction is logged in a signed, append-only record. The customer doesn't have to trust the platform operator. They can verify it themselves.

For regulated industries, this changes the compliance conversation. Instead of saying "we promise we follow HIPAA," the platform says "here is the cryptographic record of every action we took with this protected data. Verify it yourself, or hand it to your auditor."

04 · M-of-N Governance

Every state change goes through approval.

Inside a Safebox, code does not have free rein to change state. The architecture has a strict separation between Tools (which run code and observe the world) and Actions (which actually modify it). Tools can propose actions. Actions don't execute themselves. Each action type has a policy. The policy decides whether the action runs automatically, requires a single human approval, requires a quorum of approvals, or is rejected outright.

Tool — A piece of code that runs inside the Safebox sandbox to do work: query a database, call an LLM, fetch a web page, summarize a document. Tools can read the world. They cannot write to it directly. Every write goes through an Action.

Action — A proposed state change with a defined shape (create a stream, send a message, transfer funds, deploy a model). Each action carries an actionType. Each actionType has a policy. The policy determines whether the action runs immediately, after one signature, after M of N signatures, or never.

This is the same governance pattern used by every serious decentralized finance protocol — except instead of protecting cryptocurrency, it's protecting your AI workflows. Any 3 of 5 signers must agree before privileged actions execute. The 5 could be your CTO, your compliance officer, your existing audit firm, a board member, and an outside legal advisor. You choose. The system enforces.

Policies that come pre-defined

Safebox ships with a small library of policy primitives that cover most regulated workflows:

Policy	How it decides	Used for
Auto-approve	Approve immediately if the action passes deterministic checks (shape, scope, allow-list)	Read-only actions, internal cache updates, idempotent reconciliation
Single approval	Wait for one valid signature from any authorized signer	Most user-initiated workflow steps
M-of-N quorum	Require M signatures from the N-member committee	Privileged changes, model loads, key rotations, configuration updates
Time-locked	Approve, but delay execution by T hours so signers can object	Irreversible actions, fund transfers, data exports
Custom code	Run a sandboxed policy script that evaluates the action's attributes	Domain-specific rules — "no transfers over $10K to unknown wallets between 6pm and 9am"

Policies are themselves streams managed by the same governance system. Changing a policy is itself an action that requires the appropriate approval. There is no special back door for changing the rules.

05 · The Sandbox

Code that can't reach out and touch the world.

When a tool executes inside Safebox, it runs in a sandboxed JavaScript context with no filesystem access, no network access, and no access to the host's environment variables. The only way for the tool to do anything that touches the outside world is to call methods on the Protocol object, which is the audited interface the platform provides.

Safebox uses three sandbox backends, picked at runtime based on what's available:

Backend	Isolation level	When used
Q.Sandbox (worker threads)	Separate Node.js worker thread with RPC-only communication	Default. Native to Node.js, no extra dependencies.
isolated-vm	Genuine V8 isolate — separate heap, hard memory cap, no shared prototype chain	When the optional npm dependency is installed. Closes the well-known vm.createContext escapes.
vm.createContext	Same-process sandbox — not a security boundary	Fallback only, for governance-approved code on hosts where neither of the above is available.

Every execution produces an execution hash: a SHA-256 of the tool's source code, its input, its random seed, every RPC call it made into the host, and the final result. This hash is signed and recorded. If anyone disputes what the tool did, the execution can be replayed deterministically against the same hash and verified bit-for-bit.

Why determinism matters here

An adversarial tool author cannot prove their code is innocent if the only record is what they tell you happened. With an execution hash, the burden of proof reverses: the platform records what actually executed, and any party can replay it. Disputes become verifiable instead of testimonial. This is the same property that makes blockchain transactions impossible to repudiate — applied to ordinary computation, with much less drama.

API keys, OAuth tokens, secrets, and credentials never appear in tool source code. Tool authors write placeholder tokens like {{keychain:openai-key}}, and the platform substitutes the real values inside the sandbox at execution time — from an encrypted keychain that's wrapped with a per-community HKDF-derived key. The substitution happens after the tool's source has been hashed, so the cryptographic record never contains the secrets, but the running code has what it needs.

06 · The Protocol Layer

How Safebox talks to AI — including your local models.

The Protocol object is the auditable interface between sandboxed tools and the outside world. Every external call any tool makes goes through it. The surface is small enough to enumerate:

Protocol method	What it does
`Protocol.LLM`	Chat completions. Auto-routes between OpenAI, Anthropic, and local-model adapters based on model name. KV cache aware.
`Protocol.Diffusion`	Text-to-image. Stability AI and compatible endpoints. Routes to local image runners when configured.
`Protocol.HTTP`	Generic web requests. Method, URL, headers, body, timeout. Rate-limited and audit-logged.
`Protocol.SMTP`	Email sending through configured providers.
`Protocol.SMS`	SMS through Twilio or compatible.
`Protocol.Push`	Mobile push notifications.
`Protocol.Telegram`	Telegram bot operations.
`Protocol.Web3`	EVM-chain transactions.
`Protocol.Files`	Read and write the platform's content-addressed file storage.

Local models — connecting to your Safebox Infrastructure

The interesting wiring is between Protocol.LLM and the model runners on your Safebox Infrastructure. The same tool code that talks to Anthropic in development can route to Gemma 4 12B running locally on your infrastructure in production — without code changes. The Protocol layer detects the model name and dispatches to the appropriate adapter.

This matters for two reasons. First, it means tools written against the platform are portable across deployments: same code, different inference backend, same governance and audit trail. Second, it lets organizations adopt local AI gradually — start with hosted models for prototyping, swap in self-hosted models for production, without rewriting the application.

KV cache awareness for prefix-heavy workloads

The Anthropic adapter implements full prompt-caching support — system blocks marked with cache_control are cached on the provider side, and the platform tracks cache_read_input_tokens and cache_creation_input_tokens in usage metrics. When tools route to local llama.cpp runners via the Infrastructure layer, the same prefix gets pinned to a per-conversation cache slot that persists across container restarts.

For agentic workloads — where many requests share a long stable system prompt of tools, persona instructions, and RAG context — this is the difference between a 50ms response and a 5-second response. Safebox makes the cache-friendly path easy to take and impossible to mismeasure.

The detailed story on which inference runtime to use for what — vLLM, llama.cpp, Ollama, SGLang — is on the runners page.

07 · How It Stacks

Infrastructure → Safebox → Safebots.

Safebox is one layer of a three-layer architecture. Each layer adds a safety property the next layer builds on. The investor deck describes this as "what blockchain did to banking, applied to AI" — and the three layers map directly onto Bitcoin, Ethereum, and the consumer-app ecosystem that grew on top.

Infrastructure Like Bitcoin (2010)

A sealed compute environment with hardware attestation, M-of-N privileged operations, ZFS snapshots, and a privileged surface small enough to read in five minutes. Replaces the trusted system administrator. Read more →

Safebox Like Ethereum (2014)

The durable trust execution layer. Workflows, tools, sandboxed actions, M-of-N governance, cryptographic attestation, OpenClaim signatures. Anything more complex than installing a package goes here. (This page)

Safebots Like MetaMask (2017)

Consumer-facing AI agents and applications built on top of Safebox. The chat interface, the agent personalities, the business logic. Read more →

The pattern is the same as it has been in every wave of computing: each layer pushes safety down, so the layer above can be built freely. Bitcoin did the math; Ethereum built the programming model; consumer apps grew on top. Safebox Infrastructure does the hardware attestation; Safebox does the governance and execution; Safebots and the apps that follow grow on top.

Other open-source pieces that fit on this stack

The same architecture supports a growing family of components, each handling a different concern but sharing the same governance and audit primitives:

Safebots — AI agents that compose tools, chat, and respond inside the Safebox runtime.
Grokers — open-source code analysis tools that operate on entire repositories deterministically.
Code — workflow-driven code generation with deterministic streaming.

Each of these is open-source, each operates under the same M-of-N governance, and each can be deployed independently or alongside the others depending on what an organization needs.

08 · The Bottom Line

What this is worth to a business.

For a CFO or general counsel evaluating this, the relevant comparison is to the cost of the audit treadmill described in section one. Safebox doesn't eliminate compliance work — there are still regulators with checklists, auditors with opinions, and contracts that need signing. What it eliminates is the engineering cost of producing evidence that satisfies those checklists, because the evidence already exists in cryptographically verifiable form.

5–15%

Of engineering payroll typically spent on audit prep

$300K–$1.5M

Mid-market annual audit spend

Hours

Time to verify a Safebox audit log vs weeks for a SOC 2 review

Open

Source code, source standards, source signatures

What changes for the customer

The customer of a Safebox-deployed service gets a verifiable receipt for their interaction with the AI. They don't have to take anyone's word for what was done with their data. They can show the receipt to their own auditor, their own regulator, their own lawyer, and the receipt verifies independently. This is closer to how the financial industry treats trade execution than how the AI industry currently treats inference.

What changes for the regulator

A regulator who asks "show me every AI interaction involving protected information for the period in question" gets a cryptographically signed log, not a Jira export. The log is consistent across vendors, formats, and jurisdictions because Safebox is one standard. There is less to argue about because there is less ambiguity to argue from.

What changes for the investor

An investor looking at AI infrastructure is currently looking at companies that bolt safety onto products that were not designed with it. Safebox is one of a small number of efforts going the other direction — designing safety into the substrate and letting products grow on top. The premise is that the safety substrate is itself a market: as regulators tighten and as customers demand verifiability, the demand for a standard that delivers both compounds.

This is the bet of the investor deck — that the AI industry is now where the financial industry was when Bitcoin published its whitepaper. The infrastructure layer is being rewritten. The companies that build the rewrite will define the standard for what comes after.

Audits attest to a moment. Cryptographic records attest to everything.

This is shipping now. We'd like to show you.

If your organization is evaluating compliance-grade AI deployment — for regulatory reasons, sovereignty reasons, cost reasons, or because the audit treadmill has gotten unsustainable — we'd be glad to walk through what Safebox would look like in your specific environment.

Schedule a conversation →