The substrate that coordinates user data coordinates code. Same forks, same governance, same audit. Add ZFS for filesystem-level forking and one-way git push for human review, and parallel refactoring, governed pull requests, contract-checked rippling, and CI replacement fall out of primitives that already exist. Conversation lives one layer above, as Safebots. No swarms of agents. No new infrastructure. Just workflows.
For most of its life Safebox operated on what you'd call ordinary data. Emails arriving at a community inbox. Jobs queued for processing. Posts and articles materialized from external feeds. Calendar events, customer records, conversational messages. The verbs were ordinary too: route, summarize, schedule, file, answer.
Then we pointed it at code. Specifically, we pointed it at a Qbix codebase that Grokers had already indexed — every function, method, and class extracted into a Grokers/symbol stream, every call site into a Grokers/calls relation, every cross-language event hook into a Grokers/extern bridge node. The codebase, in other words, became a graph in the substrate's existing stream model. The act of writing code didn't change. The representation of code did.
What we discovered was that Safebox's machinery — workflows, tools, capabilities, action proposals, governance gates, judgment dispatch, audit chains — operated on this graph identically to how it operated on email or jobs. The verbs changed (modify, rename, split, document, test) but the verb framework didn't. A workflow that modified a function looked structurally indistinguishable from a workflow that filed a customer support ticket. Both forked a workspace. Both proposed actions. Both passed through the same governance. Both produced the same audit chain.
This is the realization that organized everything that follows. Workflows aren't a user-data abstraction. They're a coordinated state abstraction, and code is just another kind of state that needs coordinating. Once Grokers turned code into structured graph nodes, the rest was already built.
The substrate operates on streams. A function is a stream. A refactor is a workflow. A pull request is a workspace fork pushed to git. None of this required Safebox to grow a code-specific subsystem.
Start with a concrete example: change the signature of a function whose callers are spread across several files. The classic refactor. In a swarm-of-agents architecture this is improvised — one agent reads the target, another reads the callers, they share notes through chat, they hope nothing was missed. In Safebox the same task decomposes into a workflow with explicit steps and tools.
| Step | What runs |
|---|---|
| Snapshot | The workflow forks a workspace from the current Grokers/repo stream. The substrate creates the workspace stream; a hook fires and takes a ZFS snapshot of the dataset backing the codebase, then clones it. The clone is the workspace's filesystem reality from this point on. |
| Read | Tools fetch the target function's source, its callers, applicable conventions, and dependency context. All of this is one round-trip per query against Grokers' pre-built graph and the substrate's Safebox/convention streams. No re-inference. No file walks at LLM time. |
| Propose | A decide-class tool calls the LLM with the target source, the change description, and the conventions that apply. The LLM returns a rewrite. The tool emits structured intent — not an action, not a write. Just the proposed new source. |
| Verify | The proposed rewrite is applied inside the workspace clone. Tests run against the clone — not against the live codebase. If they pass, the workflow advances. If they fail, the LLM is invoked again with the failure as additional context. Bounded retries. After the cap, escalate. |
| Triage callers | The graph already knows every caller. A tool walks them, classifies each as mechanical (signature delta only), semantic (caller logic needs adjustment), or structural (caller needs redesign). Mechanical and semantic are batched into parallel sub-workflows. Structural escalates to a human. |
| Apply | Each callsite update is an Action.propose against its symbol stream, gated by the same governance the substrate applies to any write. The workspace's symbol streams update; the workspace's clone reflects the new source. The base codebase stays untouched. |
| Publish | If the workflow's last step succeeds, a hook fires that runs git commit against the workspace clone and pushes to a branch on the upstream remote. Branch name encodes the workflow ID. The branch becomes a pull request humans can review. If no git remote is configured, the substrate state is the only state. |
Every step composes existing primitives. Forking workspaces, proposing actions, walking relations, dispatching judgments, taking snapshots, posting messages — all of it was already in the substrate before we pointed it at code. The verbs are new. The machinery isn't.
Sequential agent loops dress up improvisation as autonomy. Workflows make the same work auditable, governed, and parallel-safe — without taking the human out of the loop.
The hardest part of any refactor isn't writing new code. It's making sure the new code keeps the promises the old code was making — and finding everything else that breaks if it doesn't.
Every function that Grokers analyzes carries a structured contract: Grokers/preconditions (what must be true to call it), Grokers/postconditions (what's true after it returns), Grokers/sideEffects (what it changes outside its own scope), Grokers/invariants (what holds throughout). These are prose, written by the analyzer during the comprehension pass. They're not formal logic — they're claims a careful reader could verify against the source. Their value isn't in being theorem-checkable; it's in being explicit, so downstream workflows can reason about whether they still hold.
When a workflow proposes a refactor, the verification step doesn't just run tests. It re-derives the contract from the new source and compares against the old contract. Three classifications fall out, and each routes to a different downstream behavior.
| Classification | What it means | What the workflow does next |
|---|---|---|
| Contract preserved | The new source has equivalent pre, post, side, invariant claims. The function does the same thing, just expressed differently. | Update the symbol stream. Update the symbol's source on disk. No caller changes needed beyond signature and type adjustments. Done. |
| Upward-compatible | Preconditions relaxed (accepts more inputs) or postconditions strengthened (guarantees more about outputs). Existing callers were satisfying the stricter old contract; they still satisfy the new one. | Update the symbol stream. Note the relaxation in the audit trail. Callers don't need semantic changes, but downstream documentation that referenced the old stricter contract should be updated as a follow-on. |
| Incompatible change | Preconditions strengthened (requires more from callers) or postconditions weakened (guarantees less). Some existing callers will fail to meet the new requirements or fail to handle the weaker guarantees. | Walk callers via Grokers/calls reverse traversal. For each caller, evaluate whether it still works under the new contract. If yes, signature update only. If no, recurse — the caller becomes its own code/modify sub-workflow with its own verification gate. |
The third case is where the architecture earns its complexity. Without explicit contract checking, a refactor that subtly weakens a postcondition propagates the weakness into every caller silently. With it, the workflow surfaces the change, walks the callers, and either fixes them or escalates. The recursion is bounded because the call graph is a directed acyclic graph (Grokers handles cycles separately as concepts to investigate). Most ripples terminate within two or three levels.
When the contract changes, the docstring is wrong by the same change. The code/proposeRewrite tool produces the new source and the new docstring as a coupled output — they're not separable artifacts. Verification then checks both. And critically, when the docstring needs updating in callers (because the caller's docstring references the called function's behavior), that update is targeted: edit the two sentences that became wrong, leave the rest of the prose alone. Reviewers see meaningful diffs, not stylistic churn.
A function with associated tests has Grokers/covers relations from Grokers/test streams pointing at it. When the workflow finishes the rewrite, the verification step runs those tests against the new source inside the workspace clone. If they pass, the contract was probably preserved correctly. If they fail, the workflow either retries the rewrite with the failure as additional context (bounded retries), or marks the existing tests as themselves needing update — which kicks off a code/test sub-workflow whose contract is "produce a test that exercises the new behavior, verify it actually exercises the symbol via mutation testing, register it."
Two layers of grounding make the LLM's proposals match the codebase rather than match the LLM's training corpus.
Conventions are governed substrate streams of type Safebox/convention, with an appliesTo attribute marking the contexts where they apply — code-style/php, code-style/js, tool-generation, prompt-assembly, and so on. Some are mechanical: tabs versus spaces, brace placement, identifier casing, line-length norms. Others are idiomatic: this codebase prefers callbacks over Promises in async paths, errors are returned as tuples rather than thrown, validation lives in handlers not models. Grokers extracts the mechanical ones during indexing automatically. The idiomatic ones are LLM-inferred during the comprehension pass and proposed for human approval before being injected into rewrite prompts. Conventions can be added, edited, voted on, and superseded the same way any other governed stream can.
Hints are framework-specific patterns Grokers recognizes during analysis. The Qbix indexer ships hints files that pre-confirm patterns like "Q::event(name, params) is a hook fire — bind the params to the named hook," "Q.batcher wraps a function in batching — its callers receive arrays not individuals," "Q.Method is a lazy-load stub — the real implementation is in a sibling file." These hints feed into the prompts the rewrite tool assembles, so when the LLM is asked to refactor a function that interacts with a framework convention, it starts informed rather than inferring from scratch.
Together, conventions plus hints mean the LLM is reasoning in a context where most ambiguity has already been resolved. The codebase's style is a fact; the framework's idioms are facts; the analyzed contract is a fact. The LLM's job is to write code that respects the facts. The verification step catches the cases where it didn't.
The workflow re-derives the contract after the rewrite. Equivalent or upward-compatible: done. Incompatible: ripple to callers, recurse, verify each level. Tests, docs, conventions, hints all feed into the same machinery. No magic. No agent improvising the path. Just steps, each with declared inputs and outputs, each governed.
The substrate's stream layer captures the symbol-level changes — what a function used to be, what it now is, who proposed the change, which tests verified it, which judgment voted to allow it. Below that, two more layers of state need to exist for code to actually compile, link, and ship.
A workspace fork in the substrate is a logical claim — "this branch of state diverges from base here." That claim has to land somewhere on disk if the workflow is going to run tests, invoke compilers, or produce binaries. ZFS clones provide this naturally. When the substrate creates a workspace stream, a Safebox hook takes a ZFS snapshot of the relevant dataset and clones it to a writable mountpoint. The workspace's filesystem operations happen in that clone. The base dataset is untouched.
ZFS is the right choice for three reasons. Snapshots are atomic and cheap, so taking one at workflow start adds no measurable latency. Clones are copy-on-write, so two workspaces forked from the same base share storage until one of them writes, which means parallel workflows don't multiply disk usage. And destruction is instantaneous: when a workflow fails verification and the workspace is closed, the clone is gone and the base is exactly as it was. Failed workflows leave no filesystem residue.
What ZFS handles that the substrate alone cannot: large binary blobs that don't belong in stream rows (compiled artifacts, model weights, image assets), database state that needs to be snapshotted alongside the code (test fixtures, migration baselines), and test suite outputs that benefit from being captured as part of the same atomic snapshot as the code that produced them.
An obvious-looking shortcut — symlinks plus moves to fake snapshots on hosts that don't run ZFS — is worth ruling out explicitly. It doesn't work. Atomicity is fake (mv plus symlink is many operations, not one), tools that resolve paths produce surprises (some test runners write outputs to the symlink target's directory, contaminating the supposed isolation), and inode-level operations behave wrong (file watchers see two different files where there's logically one). The architectural alternatives where ZFS isn't available are btrfs subvolumes (closest equivalent, same copy-on-write semantics) or plain copy-on-fork (full duplication per workspace, expensive on disk but semantically clean). Symlink hacks are the worst option of the three.
One detail of how the substrate represents code matters enough to call out. The canonical state of a function lives in its Grokers/symbol stream — the source text is the stream's content; the contract is its attributes. The file on disk is a derived artifact, assembled from symbols in their declared file-order plus the non-symbol regions between them (imports, top-level declarations, comments outside any symbol). Line numbers are not stored on symbol streams. They're computed when needed, from the file's current content inside the workspace clone.
This matters because it means modifying one symbol doesn't require updating every other symbol in the same file. The substrate doesn't have to ripple line-number changes through all the unaffected symbols every time a refactor shifts source positions. The shift exists in the file as a consequence of the splice, but it isn't substrate state. When a workflow finishes, the workspace's modified files get re-grokked (call site line numbers stored as Grokers/calls relation weights are updated), but the substrate doesn't carry stale line numbers between operations. The symbol is canonical; the file is the projection of the symbols at any given moment; line numbers are properties of the projection.
Humans expect to review code changes in git. Pull requests, branch comparisons, blame, history. Safebox accommodates this by treating git as a projection layer — a one-way export of completed work to a system humans operate. Each successful workflow that touches a git-tracked codebase produces a branch on the upstream remote. The branch's commit message references the workflow's audit chain so a reviewer can trace any line of code back to the workload that proposed it, the judgments that voted on it, and the tests that verified it.
The relationship runs one direction. Safebox pushes to git; it does not pull. This is a deliberate architectural commitment, and it's worth being explicit about why. If Safebox pulled from upstream, the substrate would have to reconcile its view of the codebase with whatever changes humans (or other Safebox instances) made independently. Reconciliation is a hard problem in distributed systems generally and a wickedly hard problem when one side of the merge is a structured graph and the other is text diffs. Forward-only avoids it.
The cost: the substrate's view of the codebase can drift relative to upstream. The benefit: the substrate's view is always internally consistent, and any branch it produces is grounded in a known-good baseline. When drift becomes a problem operationally — upstream has moved on, the substrate's grokked view is stale — the operator clones upstream again at a new commit. A new Grokers/repo stream comes into existence; Grokers indexes it; new workflows fork from the new baseline. Old workspaces remain queryable as historical record. Three operations — initial deployment, periodic refresh, historical analysis — all collapse into the same pattern.
| State layer | Owner | Purpose |
|---|---|---|
| Stream graph | Substrate (Streams plugin) | Symbol-level state, relations, audit chain, governance, message log. The canonical truth about what has happened. |
| ZFS clones | Safebox hooks on substrate fork events | Filesystem-level state for each workspace. Where compilers, test runners, and build tools actually operate. Lifecycle bound to workspace stream. |
| Git branches | Safebox hooks on sprint completion | Optional projection for human review. Pushed only when a workflow completes successfully and the repo declares a git remote. Long-lived branches; no merging back into the substrate. |
Substrate is canonical. ZFS is the filesystem realization of substrate forks. Git is a projection humans look at. Information flows substrate → ZFS → git. Never the other way.
Once the basic shape is right — fork a workspace, read context with conventions and hints, propose with the LLM, verify against the contract, apply through governance, publish via the projection layer — the same shape carries a whole family of code operations. None of them require new substrate primitives.
| Workflow | What it does | How ripple works |
|---|---|---|
code/modify |
Change a function's behavior. Re-derive the contract. Update callers if the contract changed incompatibly. | Walk Grokers/calls reverse. Recurse on incompatible callers. Bounded depth. |
code/rename |
Rename a symbol. Update the symbol's stream identity, update every call expression that named it. | Mechanical at every call site. The graph already knows them all. Single wave of parallel updates. |
code/split |
Decompose a large function into smaller ones. The original keeps its public signature as a wrapper or dispatcher. | Callers untouched — the public contract is preserved by construction. Internal change only. |
code/inline |
Eliminate a helper function. Propagate its body to all callers, with parameter substitution. | Every caller gets the body inlined at the call site. Mechanical. The helper's stream gets closed. |
code/document |
Generate or update documentation for a symbol or set of symbols. Targeted edits to existing Grokers/doc streams, never wholesale regeneration. |
If the change reflects a contract update, walk Grokers/calls for callers whose docs mention the called function. |
code/test |
Generate tests for symbols flagged as untested or under-covered. Verify orthogonality (the test fails on a mutated symbol), register the test stream with a Grokers/covers relation. |
No ripple — tests are leaves in the dependency graph. |
code/audit |
Walk a query result (all symbols flagged with dynamic dispatch, all symbols missing comprehension, all symbols matching a security pattern), produce findings as Grokers/clue streams for an investigator. |
No ripple — audit is read-only. The clues themselves trigger downstream workflows when investigators pick them up. |
code/migrate |
Apply a transformation pattern across many symbols at once. Rename an API, deprecate a pattern, port a syntactic idiom. Batch version of code/modify. |
Contract verification per symbol. Symbols that pass verify get applied; symbols that fail escalate. One workflow, many parallel sub-workflows. |
code/notify |
When a contract changes, identify other parties — humans subscribed to the symbol's stream, downstream services that call it via an extern bridge, plugins that depend on the changed plugin's ontology — and post messages to them describing the change. | Notification follows substrate subscription primitives. The same machinery that notifies users about new email notifies developers about API contract changes. |
Each entry in this table is a separate workflow with its own definition, but the shape is uniform. They share infrastructure — snapshot machinery, convention lookup, contract verification, audit chain, governance gating — and differ only in their middle steps. The family isn't held together by inheritance or by a common framework class; it's held together by composing the same substrate primitives in different orders.
One workflow in that table — code/notify — is worth dwelling on, because it's where the substrate's existing user-data heritage shows through clearly. When a Qbix function's contract changes, the people affected are: developers maintaining callers, operators of services that depend on the function via an HTTP endpoint or an event bridge, plugin authors whose ontology references the changed plugin's ontology, and humans who explicitly subscribed to the symbol's stream because they wanted to be told.
The substrate already has machinery for this. Streams.subscribe. Subscription rules. Notification dispatch via email, mobile, or in-app. Filter rules to suppress notifications under certain conditions. All of it built for ordinary user-data flows and now applicable, without modification, to code-state changes. A developer who wants to know when a function they depend on changes its contract subscribes to that function's symbol stream the same way they'd subscribe to a chat channel. They get a notification. They review the change. They update their code if needed.
Cross-language notifications work the same way. A PHP function fires a Q::event hook; a JS handler listens. Grokers represents this as both sides pointing at the same Grokers/extern/hook/... bridge node. When the PHP function's contract changes, the workflow's code/notify step traverses the bridge node to find every listener — across languages, across plugins — and posts notifications to their authors. The substrate doesn't care that one side is PHP and the other JS. The relation graph is uniform.
The same machinery that notifies users about new mail notifies developers about contract changes. The substrate doesn't distinguish kinds of state — it distinguishes kinds of subscription.
The interesting property of this family is that the workflows compose without anybody orchestrating them. A code/modify that produces an incompatible contract change spawns a code/notify as a downstream effect, which posts messages, which triggers human review, which may produce its own code/modify on a downstream caller, which may spawn another code/notify. The graph of activity unfolds as workflows trigger each other through stream events and substrate messages, not through a central planner deciding what runs next.
This is why the substrate doesn't need to know what an "agent" is. The activity that an agent-driven architecture would describe as "the agent decided to update the documentation after the refactor" is, here, "the code/modify workflow's audit message triggered the Streams/messageType handler that spawned a code/document workflow against the same symbol." The decision-making is structural: declared in the substrate, governed at every step, queryable by anyone with read access to the audit chain.
Once code modification is a workflow primitive, multiple teams can run modifications in parallel against the same upstream codebase. Each team operates its own Safebox instance, with its own community, its own governance policies, its own credential vault, its own ZFS pool. Each instance clones the same upstream repo independently. Each instance forks workspaces independently. Each instance pushes branches to git independently.
Because forks are forward-only and the substrate has no merge primitive, two teams working in parallel produce two parallel branches in upstream git, both grounded in the same baseline commit. The substrate makes no attempt to reconcile them. Reconciliation, if needed, happens upstream — by humans reviewing both PRs, by an integration team merging them sequentially, by whatever process the upstream project already has for handling concurrent contributions. Safebox stays out of that decision. It produces well-formed proposals; it doesn't insist on how they get integrated.
This composes with the existing Safebox governance model in a useful way. A team's Safebox enforces M-of-N approval before any branch leaves the substrate. A reviewer signs off as a judgment vote, captured as a substrate stream, before the git push happens. The PR that lands on GitHub or GitLab carries provenance: which workflow, which workload, which approvers, which test results. A reviewer on the upstream side can verify the chain back to its source — the same way a regulator can verify any other Safebox-produced action.
| What teams used to do separately | What they can now do under one substrate model |
|---|---|
| Each team sets up its own CI on its own infrastructure. | Each team runs a Safebox that runs workflows including test execution. CI becomes a workflow class. Test results land in streams. The audit trail is uniform across teams. |
| Refactoring is a single-developer activity, sequenced by humans avoiding stepping on each other. | Multiple workflows fork independent workspaces in parallel. The substrate's existing access cascade prevents collision; the absence of merge means no contention. Teams produce branches; humans integrate. |
| PR review involves trusting that CI ran, that no one manually pushed, that the diff is what it claims to be. | Every commit carries the audit chain back to the workflow that produced it. Reviewers can re-run the verification step against the same workspace clone. Replay is a property, not a hope. |
| Cross-team coordination requires shared infrastructure, shared secrets, shared ops. | Each team's Safebox is independent. Federation primitives let them share specific streams (an ontology, a convention library, a test corpus) without sharing infrastructure. No shared ops surface. |
Continuous integration today is a separate stack. GitHub Actions, GitLab CI, CircleCI, Jenkins. Each one runs on its own runners, with its own credential management, its own caching layer, its own audit story. When a CI run fails, you read its log. When it succeeds, you trust that it ran. The integration with the codebase is a YAML file in the repo and a webhook on the git provider.
Most of what CI does — run tests, run linters, build artifacts, publish results, notify reviewers — is already what a Safebox workflow's verification step does. The difference is that a Safebox workflow's verification step happens before the commit, not after. The workspace's tests run inside the workspace's ZFS clone, with the workspace's proposed changes already applied, while the workflow can still abort. A failing test in CI prompts a developer to push a fix. A failing test in Safebox prompts the workflow to retry, escalate, or roll back without anything leaving the substrate.
For codebases that already use CI, Safebox doesn't replace it overnight — the workflow can push a branch, and existing CI runs against the branch as it always has. The audit chain captures both: the substrate-side verification that the workflow performed before pushing, and the CI-side verification that ran afterward. Over time, as Safebox workflows accrete coverage, the CI step becomes redundant. It can be retired one project at a time.
For new codebases, the picture is simpler. There's no separate CI to integrate with. Tests, linters, builds, deploys are all workflows. The substrate is the integration point. If a project uses GitHub for code review and wants pull requests, it gets pull requests via the projection layer. If it doesn't, it doesn't need them — the substrate's audit chain replaces what humans were using PR review to enforce.
Cost. CI infrastructure costs money. Self-hosted runners cost money. Cloud-hosted CI costs more. A team running its workflows under Safebox pays for compute it would have paid for anyway, but doesn't separately pay for a CI stack it now doesn't need. For organizations running many parallel projects, the savings compound.
Everything described so far is mechanical. Once a human says "modify this function to handle the new error case," the workflow runs without further human input until completion or escalation. But humans rarely arrive at "modify this function to handle the new error case" already knowing exactly what they want. The act of figuring out what to do is conversational.
That conversational layer is what Safebots is for. A developer talks to a bot about a bug they're seeing, the bot queries the Grokers graph, the bot proposes hypotheses, the developer pushes back, the bot revises, eventually they arrive at a concrete change that needs to happen. The bot then proposes a workflow invocation as its action — "I'd like to run code/modify on these three functions with this change description" — and the workflow takes over. Conversation produces the intent; workflows produce the change.
This is the layering that distinguishes Safebox's architecture from the swarm-of-agents pattern in current AI tooling. In a swarm, the conversational layer and the doing layer are entangled — the agent talks AND acts, picking its own tool calls in free-form sequence, with no clean separation between "we're still figuring out what to do" and "we're now doing it." Safe behavior in a swarm depends on the agent making good choices about which actions to take when. The substrate doesn't help, because the substrate isn't involved.
The decoupling is the safety property. Safebots talks all it wants — the conversation is itself substrate-tracked, captured as messages in chat streams, queryable in the audit history, but it commits no changes. Acting requires invoking a workflow, and workflows go through the substrate's existing machinery: declared action types, governance gates, judgment dispatch, contract verification, ZFS-backed isolation, audit chain. The conversational layer's job is to figure out the right workflow to run, with the right inputs, at the right time. The doing layer's job is to run that workflow safely, regardless of how exotic the conversation that produced it.
Industry vocabulary names a thing called an "agent" — an LLM-driven loop that talks, decides, and acts in one process. Safebox doesn't have one. What it has instead is two cleanly-separated processes: a conversational system (Safebots) that generates intent, and a workflow system (Safebox) that realizes intent under governance. Each is auditable independently. The conversational side can be wrong about what to do, and the worst that happens is a workflow proposal that governance rejects. The workflow side can fail mid-execution, and the worst that happens is a workspace fork that gets rolled back. Neither failure mode produces an unbounded blast radius.
The trick is that the conversational layer doesn't lose any of the capabilities the agent framing was promising. A developer can have a long-running, exploratory, multi-turn conversation with Safebots about a refactor. The bot can read code through Grokers tools, run hypothetical analyses, draft changes, ask clarifying questions, revise its understanding. All of that is just substrate operations at a conversational pace. When the conversation converges on a change to make, the bot proposes the workflow, and the developer (or governance, or both) approves. The conversation continues — perhaps about the next change, perhaps about the workflow's results. The substrate is fully in the loop the whole time, not present at decision moments only.
We have decoupled agents. Conversation is one substrate operation; workflows are another. Neither pretends to be the other. Safety lives at the seam.
Notice that this same decoupling shows up at each layer of the stack. Grokers indexes; Safebox runs workflows on the index; Safebots converses about what workflows to run. Each layer is a distinct substrate operation. None of them does the other's job. None of them depends on a unified "agent" abstraction that conflates them.
A team using all three layers gets: Grokers' pre-computed graph (so the conversation has accurate context), Safebox's workflow machinery (so the conversation's outputs get realized safely), and Safebots' conversational layer (so the team can think out loud about complex changes without hand-authoring every workflow invocation). A team using just Grokers and Safebox skips the conversational layer and authors workflow invocations directly — fine for batch operations, scheduled maintenance, and CI. A team using just Grokers integrates pre-computed code understanding into whatever tooling they already have without committing to the rest of the stack. The layers are independent. Adoption can be staged.
The architecture sketched here is an extension of the substrate, not a replacement for the tools developers already use. Three things worth being explicit about.
Every workflow's writes pass through governance. M-of-N approval, judgment dispatch, declared action types, audit chain. A workflow that proposes a change to a function cannot bypass review by being in a hurry. The substrate's safety properties are structural — the workflow can't write outside its declared bounds even if the LLM driving it tries to. This is the same property described in architecture.html, applied to code instead of email.
For a single-file fix, the overhead of forking a workspace is unwarranted. Existing IDE-integrated coding tools remain the right tool for in-the-flow editing. Safebox's workflow model earns its complexity at a different scale: refactors that touch tens or hundreds of symbols, parallel changes across teams, audit-required modifications, scheduled maintenance work, governance-gated production deploys. The relevant comparison isn't to Cursor; it's to a small team's combined CI+PR+refactoring tooling, which is the surface this replaces.
For codebases that already live in git, git stays. The projection layer pushes Safebox-produced branches there. Humans review PRs the way they always have. Git history continues to grow normally. What changes is what produces the commits and what verifies them before they land. For codebases that don't live in git, the substrate is the source of truth; git can be added later as a projection if humans need to look at the code that way.
The phrase comes from inside the team. We had spent months thinking about workflows as the right abstraction for ordinary work — answering email, scheduling meetings, processing claims. When code modification turned out to fit the same shape, the obvious question was: what else does. The answer seems to be most things. Infrastructure configuration. Scientific data curation. Regulatory filing. Training data preparation. Anything that involves coordinated state that multiple parties want to operate on, that needs an audit trail, that benefits from forward-only history with explicit divergence rather than retroactive reconciliation.
The substrate doesn't care what domain it's operating in. It composes the same twelve primitives — streams, relations, forks, workspaces, messages, actions, policies, judgments, conventions, capabilities, tools, workflows — onto whatever shape of state needs coordinating. Code happens to be one shape. There will be others. The shape we'll be most surprised by is probably not the one we're building for next; it's the one that, like code, was sitting in plain sight the whole time waiting for the right representation.
The substrate's bet is that most coordination problems have the same structure underneath, and a system that solves the structural problem solves most of the surface problems automatically. Code as the second domain confirms the first.
code/modify.