The Next Claude Code

The idea, briefly

Imagine an AI engineer who remembers your codebase.

Today's best AI coding tools — Claude Code, Cursor, Copilot — are brilliant short-term thinkers. You ask one to change something in your code, and it cracks open your project, reads whatever files it thinks are relevant, makes its best guess about how everything fits together, and writes the change. The work is impressive. It's also amnesia. The next conversation, it does the whole thing over again, from zero. Nothing it learned about your code persists.

What we've built starts from a different premise: read the code once, understand it carefully, remember what you learned, and use that memory every time someone asks for a change. The reading and the understanding happen ahead of time, like a new engineer spending their first week getting oriented before touching anything. When a change request comes in, the system already knows what depends on what, what each function promises to do, which tests cover which behaviors, and where the same idea shows up in different programming languages across the codebase. It doesn't have to discover any of this. It can just use it.

The four pieces

Grokers — the map-maker: Reads the codebase once, building a map of what every piece of code does and how it connects to everything else. Updates the map automatically as the code changes.
Code — the change-maker: When a change is needed, walks the map to figure out exactly what to update and in what order. Verifies the change preserves the promises the code was making before. Tests it. Pushes a branch.
Safebots — the conversation layer: Where humans and AI bots discuss what should change. Multiple parties can weigh in. The conversation itself becomes part of the permanent record — six months later, the question "why does this code look like this" is a query, not an archaeology project.
Safebox — the safety net: Sealed execution. Every action goes through a governance check before it touches your code. Hallucinated reasoning produces, at worst, a rejected proposal — never a silently broken codebase.

The problem

The amnesia tax.

If you've watched an AI coding tool work, you know the shape of it. You ask it to rename a function, and it spends thirty seconds running searches, opening files, reading them, deciding which ones matter. Then it makes the change. Then it tries to update the callers, missing one or two along the way because they used the function in a way it didn't see. Then you point out the misses, it makes those updates, you ship it.

That thirty seconds of search-and-read is the amnesia tax. The tool didn't know your codebase coming in, so it had to discover it. The next time you ask anything about the same code, it pays the tax again. Across an engineering team, dozens of developers pay this tax dozens of times a day against codebases that haven't meaningfully changed since the last time anyone looked at them. The model is brilliant; the structural memory is none.

For small, contained changes — single file, single function, no surprises — the tax is fine. You pay it, you get the change, you move on. The trouble starts when the change isn't contained. A function used in fifty places. A schema that flows through three services in different languages. A library that needs deprecating across a half-dozen teams. The model has to discover all of this, every time, and the discoveries don't always find everything. Things get missed. Production breaks in the small ways that only show up under real load.

The first empirical measurements of where tokens actually go in agentic software engineering have started arriving, and they confirm what the architecture predicts. Researchers at Concordia's Data-driven Analysis of Software lab instrumented a multi-agent coding framework running on GPT-5 across thirty development tasks. They found that more than half of every token spent — 54%, on average — is input: the agent re-reading context to figure out what's going on. The iterative refinement phase, where the system loops over its own work trying to verify and fix, consumed roughly 59% of total tokens. The cost of initial code generation was a rounding error by comparison. The waste isn't speculative anymore. It's been measured. The paper calls for, in its closing recommendations, "more token-efficient collaboration protocols, moving beyond naive full-context passing." That is the polite academic phrasing of the problem we set out to solve.

Some of the highest-leverage software work — refactors, deprecations, cross-system migrations — is exactly the work that suffers most from the amnesia. The tools are best at the easy stuff and weakest where the stakes are highest.

The shift

Read once, walk forever.

The change in approach is small to describe and consequential in practice. Instead of letting the AI re-discover your codebase on every request, you do the reading once, comprehensively, and store what you learned in a database. From then on, any AI that needs to reason about the code reads from the database, not from the raw files.

The map looks like this: every function, class, and module is a node. Connections between them — "this function calls that one," "this test covers that behavior," "this code in Python publishes events that this code in Go subscribes to" — are recorded as labeled links between nodes. The AI's role in the map-building is to spend a little time with each node and write down what it does — not how it does it, but what it promises. What it expects from its inputs. What it guarantees about its outputs. What side effects it has. What invariants it maintains. About two hundred words per node, smaller than the code itself.

This is the part that pays off forever. Once a function has its contract written down, you don't need to re-read the function's source code to reason about it. You read its contract. When the function changes, the contract gets re-derived automatically. The map stays current. The amnesia tax is paid once, when the system first meets your codebase, and amortized across every conversation that follows.

The compounding insight

The expensive work — comprehension, structural analysis, cross-language linking — happens once. From then on, every query against the map is essentially free. The system grows more useful over time, not more expensive.

The cross-language part deserves a moment. In any codebase past the smallest size, the same idea shows up in multiple programming languages. A Python function exposes an API endpoint; a TypeScript file calls that endpoint; a Go service subscribes to events the Python function publishes. To today's AI tools, these are three unrelated pieces of code. To the map, they're three nodes pointing at the same underlying operation — and changing any one of them surfaces the consequences for the other two immediately. No discovery required. No language-specific tooling required. Just the map.

The consequence

What "the right context" actually means.

When an AI coding tool makes a mistake, the underlying cause is almost always the same: it didn't have the right information when it needed it. It didn't see the function that depends on the one being changed. It didn't know about the test that exercises the corner case. It didn't realize the configuration file three directories over reads a value the changed code produces.

Today's tools deal with this by reading more — more files, more context, larger token windows, longer "thinking" before acting. This works to a point and then gets prohibitively expensive. Every additional file the AI reads is a few thousand more tokens it has to pay for, and the gains shrink the further afield it reads.

The map changes this calculation. When a change to function X comes in, the system can ask the map for exactly what's relevant: every function that calls X, every test that covers it, every place its outputs flow to. The answer comes back instantly, without re-reading anything. The AI gets a focused, accurate picture of what's affected — not a guess assembled from twenty file reads, but the actual structural truth pre-computed and ready.

This is what the load-bearing phrase "right context every time" actually means. The AI receives exactly the slice of the codebase that matters for the change at hand. No more, no less. It can spend its reasoning on the change itself, not on the discovery work that precedes it. And the discovery work, when it does need to happen, returns answers measured in milliseconds rather than seconds.

The verification

Catching the silent mistakes.

The most dangerous AI coding mistakes are the ones that look right. The tests still pass. The diff looks clean. The function still does most of what it used to. But somewhere in the rewrite, a guarantee that some caller depended on quietly weakened. The function used to never return null; now it does, in an edge case nobody thought to test. Three weeks later, a service crashes in production for reasons that take a week to trace.

The contracts in the map are how this gets caught. When the system rewrites a function, it doesn't just check that the tests pass — it re-reads the new function and derives a fresh contract from it, then compares the new contract against the old one. Three things can happen:

The contracts match. Same behavior, different phrasing. Safe to apply.

The new contract is stronger. The function now promises more than it used to, or asks for less. Existing callers are still satisfied. Safe to apply, log the improvement.

The new contract is incompatible. The function now promises less than it used to, or asks for more. Some existing callers will break. Instead of applying, the system walks the map to find every caller, examines each against the new contract, and recursively repeats the process for any caller that's now broken. The whole ripple either succeeds together or fails together. No partial changes that look fine until production.

This kind of verification is structurally impossible for AI tools that don't have a map. You can't re-derive a contract you never derived in the first place. You can't compare against a baseline you don't have. The map makes the verification possible, and the verification is what makes the system trustworthy enough to let it write code without a human reading every line.

The conversation layer

Agreement becomes a record.

Most software work at any real scale isn't editing code. It's deciding what to edit. The engineer who wants to deprecate an old API has to talk to the teams that depend on it. The teams negotiate timelines, backward-compatibility requirements, migration help. The conversation happens in Slack, in meetings, in long email threads, in comments on a planning document. When the dust settles, the agreement exists in the heads of the people who were there. Six months later, when someone asks why a particular function is the way it is, the answer is in scrollback nobody will scroll back through.

The conversation layer makes this conversation itself a first-class record. When engineering opens a discussion about deprecating an API, the topic becomes an object — typed, queryable, lockable. The downstream teams' representatives (humans or AI bots they delegate to) participate. Proposals get made, contested, amended, signed. When enough signatures arrive, the agreement commits, and the agreement is permanent and queryable. The reasoning is preserved. The alternatives that were rejected are preserved. The constraint that ruled out the obvious solution is preserved.

This connects to the change-making side in a specific way: an agreement can carry an instruction to execute a particular workflow when it commits. So the cross-team negotiation about an API deprecation can end not with "we agreed, now somebody go write the code," but with the act of signing the agreement itself triggering the rewrite, grounded in the terms everyone signed off on. The link between "we agreed" and "the code changed" is automated, with a permanent audit trail running from the original message all the way to the resulting commit.

This is the kind of thing AI coding can become when the tools stop ending at the IDE and start spanning the actual shape of software work. Today, the AI participates in the rewrite. Next, it participates in the agreement that leads to the rewrite, and the workflow that executes the agreement, and the verification that catches the regression. The boundaries between those phases stop being walls and start being checkpoints in one continuous chain.

Where it stands

This isn't a research idea. It's running.

Everything described here is built and working. Not a slide deck, not a roadmap, not a research project. The map-maker, the change-maker, the conversation layer, the safety net — these are open-source plugins, in production, under the Qbix umbrella that has been quietly powering apps with seven million installs across a hundred countries for the last fifteen years.

The foundation matters here. A system like this can only be as solid as what it's built on, and the substrate underneath — a graph database with per-entity access control, copy-on-write workspaces, audit logs that can't be tampered with — has had fifteen years of production hardening across a wide variety of applications. The AI coding system is the newest thing built on top, not a greenfield experiment that might fall over the first time it meets real scale.

You can see deeper technical writing on the two main pieces:

Grokers — the map-maker, in depth Code — the change-maker, in depth

The honest case

Where today's tools are still better.

To be useful, the comparison has to be honest in both directions. Claude Code, Cursor, and Copilot are remarkable tools and the right answer for plenty of work.

For a developer in their editor, changing one file, the tools we've all been using are the right choice. The map-making investment doesn't pay off for one-shot edits. The systems described here are designed for repeated interactions with a codebase over weeks or months — that's when the up-front comprehension cost gets amortized across hundreds of subsequent queries.

For exploring an unfamiliar codebase, today's tools are immediate. The map-making takes some time — minutes to hours, depending on the size — before the system can help. If you're going to touch the codebase once and never again, skip the indexing and use what you have. The system pays off for codebases you're going to live with.

For prototyping and exploration, the governance and verification machinery is overhead. When the entire point is to move fast and break things, the safety net is in the way. Most of the value here shows up in production codebases where mistakes have consequences.

These aren't concessions. They're the matching shape of the boundary. The next Claude Code isn't a replacement for the current one — it's what the current one grows into once the constraints become serious. The two tools can sit on the same machine and serve the same developer.

For Anthropic

Why we're writing this down.

Anthropic is going to build something like this. Claude Code's users are already pressing it past the single-developer-single-file boundary; the requests are coming in for everything described above. Persistent codebase memory. Cross-team coordination. Automatic detection of consequential changes. Audit trails that survive turnover. Each of these is a small ask from a user's perspective. None of them is a small piece of engineering. And the underlying inefficiency that motivates them is no longer a matter of opinion — it's been measured. The Concordia study cited above isn't the last one of its kind that's going to land; it's the first. The pressure to address the communication tax structurally, rather than by buying more context window, is going to compound from here.

The system we've built has been in development for years. The foundation — the graph database, the access control model, the sealed-execution substrate — predates the AI coding work by over a decade and represents tens of thousands of engineering hours we don't expect anyone to redo. The AI coding plugins built on top are the newest layer, and the part that's freshly relevant to the moment Anthropic finds itself in.

The proposition we want to put in front of Anthropic is straightforward. This is what the next Claude Code can be. We have built it. It works. The engineering work an in-house team would need to do to reach the same destination from scratch is probably a year, possibly more, and involves some non-obvious infrastructure decisions that are easy to get wrong. We have done that work and would rather see it become the standard than watch it be reinvented in parallel. The codebase is dual-licensed — AGPL today, with a commercial path available. Conversations about acquisition, partnership, or co-development are open.

If you're at Anthropic and the technical case here is interesting, the next step is a one-hour walk-through of the architecture, the codebase, and the production deployments. The intent isn't a sales pitch. It's to give technical decision-makers enough context to decide whether building this in-house is the path you'd choose, knowing that the alternative is available.

The next Claude Code.

Imagine an AI engineer who remembers your codebase.

The four pieces

The amnesia tax.

Read once, walk forever.

What "the right context" actually means.

Catching the silent mistakes.

Agreement becomes a record.

This isn't a research idea. It's running.

Where today's tools are still better.

Why we're writing this down.

The conversation we want.