Architecture · AI Tooling · Knowledge Systems

Grokking vs. Guessing: Why Pre-Computed Knowledge Graphs Beat On-Demand Code Inference

Cursor and Claude Code are powerful but they re-infer the dependency graph on every query. Grokers pre-computes it once and makes it queryable in milliseconds — not just for code, but for any hyperlinked knowledge structure.

Qbix / Intercoin Platform Architecture Series

The Hidden Cost of Every AI Code Edit

When you ask Cursor to "rename this function throughout the codebase" or ask Claude Code to "update all callers after this API change," something expensive happens before the first character is written. The model has to infer the dependency graph from scratch.

It calls find, reads files it thinks might be relevant, extracts imports and call expressions, reasons about which symbols reference which, and builds a mental model of the structure. On a codebase with a thousand files, this means dozens of tool calls, thousands of tokens of context, and — critically — it happens every single time. Next turn, next query, the model re-runs the same process because nothing was persisted.

The model can miss dependencies — files it didn't read, dynamic imports, reflection, configuration files that drive behavior. It makes educated guesses and sometimes guesses wrong.

This is the fundamental architecture of the current generation of AI coding tools: inference at query time, no persistent graph. It works well for simple edits. It breaks down — in cost, latency, and correctness — as the surface of change grows.

Grokers takes the opposite approach. The expensive walk happens once, during the grokers index and grokers analyze passes. The result is materialized as Streams and relations in a database. At query time — when an agent needs to answer "what depends on this symbol?" — it calls Streams.related() and gets the answer in one round-trip, no inference required.

What Each System Actually Does

Let's be precise about the mechanics, because the difference matters enormously at scale.

Cursor and Claude Code: on-demand graph inference

Query time (every turn):

1. model asks: find . -name "*.ts" -o -name "*.js" ← filesystem traverse
2. model reads: src/api/users.ts, src/lib/auth.ts, ... ← heuristic file selection
3. model infers: import graph, call graph from source text ← reasoning step, per-turn cost
4. model reasons: "changing X affects Y because..." ← may miss dynamic dispatch
5. model writes: edit commands to N files ← may miss file M entirely

Cost structure: O(files_read × tokens_per_file) per query
Graph is discarded after the turn. Next turn: repeat from step 1.

Grokers: pre-computed graph, queryable at near-zero cost

Index time (once, amortized):

1. tree-sitter parses every file → SymbolRecords, ExternRefs, CallEdges
2. graph.js builds call graph, resolves cross-language references
3. builder.js writes Grokers/symbol + Grokers/calls streams to DB
4. LLM analyzes each symbol bottom-up: summaries, contracts, clues
5. enricher annotates call sites with parameter names from callee signatures

Query time (any subsequent agent turn):

Streams.related(symbol, 'Grokers/calls', reverse=true, depth=5)
→ returns transitive dependents in one DB query
→ <10ms, no LLM tokens, no file reads
The Core Insight

The graph inference cost in Cursor/Claude Code scales with every query. The graph computation cost in Grokers scales with the codebase, paid once. For any project you touch more than a few times, Grokers' model is orders of magnitude cheaper.

Cost Comparison: Query-Time Inference vs. Pre-Computed Graph

Operation Cursor / Claude Code Grokers Advantage
Find all callers of function X grep/ripgrep + model reads matches + reasoning
~5–20 tool calls, 500–5k tokens
fetchRelated(X, 'Grokers/calls', reverse)
1 DB query, 0 LLM tokens
100–1000×
Transitive impact of API change Model reads callers, reads their callers, reasons about chain
10–50 file reads, expensive, often incomplete
getTransitiveDependents(X, maxDepth=5)
BFS over DB relations, seconds not minutes
50–500×
What config keys does this read? Model reads function source, infers Q_Config::get calls
1 file read, but misses transitive config access
Pre-indexed Grokers/extern/config/* relations
Literal key paths extracted at parse time
10–50×
What DB tables does a module touch? Grep for table names, read ORM files, reason about inheritance
Often misses indirect access via helper methods
ORM convention auto-indexes table externs
Includes inherited access, lifecycle hooks
Complete vs. partial
Understand an unfamiliar function Model reads function + inlines callee logic
Context-window limited; may miss important callees
Pre-computed summary, preconditions, side effects
Bottom-up: callee summaries already in DB
Richer + cheaper
First-time repository setup Immediate — no preprocessing needed Index pass required first (minutes to hours) Cursor wins
One-off single-file edit Fast — read one file, make change Grokers provides context; edit still direct Roughly equal

Correctness and the Cost of Mistakes

Cost comparison is only half the story. The other half is what happens when the system gets it wrong. In a large codebase or knowledge base, a missed dependency is not a minor inconvenience — it is a broken deployment, a data corruption, a security hole that slips through review.

Failure Mode Cursor / Claude Code Risk Grokers Risk Notes
Missed caller of renamed function High
Model may not read all files; dynamic dispatch invisible
Low
Static call graph is complete; dynamic dispatch flagged as clue
Runtime crash or silent wrong behavior
Missed config key after schema change High
Config access is invisible without running code
Low
Literal keys extracted at parse time, indexed as externs
Silent wrong config read at runtime
Cross-language missed dependency
(PHP writes event, JS listens)
Very high
Model rarely correlates PHP Q::event with JS Q.on handlers
Low
Both sides indexed under same endpoint extern stream
Feature silently breaks across language boundary
Missed lifecycle hook interaction
(ORM beforeSave ↔ afterSave)
Medium
Model reads individual hooks but may miss the pattern
Low
Lifecycle-handoff clue surfaced by analyzer; concept promoted
Data integrity bug under load
Stale comprehension after code change N/A — re-infers every turn Medium
Must re-index changed files; watcher triggers incremental re-analysis
Trade-off: freshness vs. pre-computation cost
Hallucinated function signature High
Model invents plausible-looking signatures when file not read
None
Signatures extracted from AST with type hints; LLM infers only when hints absent
Wrong call convention generates broken code
The Compounding Problem

In AI-assisted development, errors compound. A missed dependency in step 1 of a refactor means every subsequent edit is built on a wrong mental model. Grokers' graph is computed bottom-up — leaves before callers — so by the time the analyzer reaches a complex function, all its dependencies are already correctly understood.

Grokers Is Not Just for Code

This is perhaps the most underappreciated aspect of the architecture: Grokers is a general knowledge graph materializer. The "symbol" abstraction — a node with attributes, call/dependency edges, and comprehended summaries — maps naturally onto any structured knowledge domain.

Websites and documentation

A website's pages are symbols. Internal hyperlinks are Grokers/calls relations. External links are Grokers/extern/endpoint/*. A section that references another section is a dependency edge. Once indexed, answering "what pages link to this one?" or "what pages would break if this URL changes?" is a single getTransitiveDependents query — not a crawl-and-parse operation at query time.

Hyperlinked knowledge bases

Obsidian vaults, Notion wikis, academic paper citation graphs: any system where documents reference other documents. The groker walks the graph once, materializes the links as Streams relations, and the LLM comprehends each node in context of what it links to — exactly as it comprehends a function in context of its callees. "What does changing this concept definition affect?" becomes a graph query, not a search-and-reason operation.

APIs and OpenAPI schemas

An API endpoint is a symbol. Its parameters are Grokers/params. The clients that call it are reverse Grokers/calls relations. Changing an endpoint's schema and finding all affected clients is getTransitiveDependents over the API dependency graph.

Data pipelines

A pipeline stage that reads table X and writes table Y has Grokers/reads and Grokers/writes extern relations. The lineage graph — "which downstream stages are affected if table X's schema changes?" — is pre-computed and queryable in milliseconds.

Domain "Symbol" "Call edge" "Extern" Query: "what breaks if X changes?"
Code Function/method Function call DB table, config key, API endpoint All callers, transitively
Website Page/section Hyperlink External URL, media asset All pages linking here
Knowledge base Document/concept [[wikilink]] or citation External source, image All documents that reference this concept
API schema Endpoint/operation Client call Auth scope, header dependency All clients using this endpoint
Data pipeline Transform stage Data dependency DB table, file path, S3 bucket All downstream stages

How Grokers Fits with Safebox and Safebots

Grokers is not a standalone tool. It is the knowledge layer for the Qbix platform's AI stack. To understand where it sits, it helps to see the full picture.

Safebots — AI agent platform for non-developers
"Build an AI workflow over a weekend"
Workflow editor → Steps → Tools → Capabilities
ZFS-snapshotted isolated execution environments
M-of-N governance for write proposals

Safebox — sealed computation with warrant-governed execution
Deterministic AMIs, TPM attestation, no SSH
Every tool call goes through Action.propose
No plaintext, no private keys, no root access
Streams as the audit trail and state layer

Grokers — knowledge graph for any structured artifact
Parse → Index → Analyze (bottom-up LLM)
Streams as the persistent graph database
Query at <10ms, no re-inference
Works on code, docs, websites, APIs, data pipelines

────────────────────────────────────────────────
Qbix Streams — the shared substrate
syncRelations, registerRelations, subscriptions
Every stream is a live observable with participants

The integration is tight and intentional:

Safebox provides the execution environment for Grokers' workers. Each analysis worker runs inside a Safebox instance — sealed, attested, no remote access. The LLM calls go through Safebox's capability system; the outputs are proposed via Action.propose and approved by the governance layer before being written to Streams. This means no Grokers worker can corrupt the graph unilaterally, even if compromised.

Grokers feeds knowledge into Safebots' tool context. When a Safebot workflow needs to "understand the codebase before making changes," it calls Streams.related() on the Grokers graph rather than running ad-hoc file reads. The comprehended summaries, parameter signatures, and dependency edges are pre-loaded into the Safebot's tool context at the start of each workflow step. The bot reasons about already-understood knowledge, not raw source.

Safebots' ZFS snapshots complement Grokers' graph. ZFS provides rollback — if a Safebot proposes a change set and it turns out to be wrong, you snapshot-rollback the filesystem. Grokers provides the graph — which things to change and what the consequences are. They are orthogonal concerns. ZFS answers "can I undo this?" Grokers answers "what do I need to change and what will I break?"

IDE Integration and the Non-Developer Path

Grokers is designed to work at both ends of the user sophistication spectrum.

For developers: IDE integration

Because Grokers materializes its graph into standard Qbix Streams — queryable via HTTP with the same API any Qbix plugin uses — IDE plugins are straightforward. A VS Code extension can query Streams.related(currentSymbol, 'Grokers/calls', reverse) to show a "what depends on this function?" panel in real time. Hover-over summaries come from Grokers/summary attributes. Refactoring previews come from getTransitiveDependents. The IDE plugin doesn't need to parse code — Grokers already did that.

The Grokers CLI (scripts/Grokers/Grokers.js) exposes all of this as command-line tools: grokers ask for natural language queries over the graph, grokers docs to generate living documentation, grokers status for comprehension progress. CI/CD pipelines can run grokers index --incremental on each commit and grokers analyze --changed-only to update comprehension for modified symbols.

For non-developers: the weekend project

Here is what a non-developer can do with Safebots + Grokers over a weekend:

They take a knowledge base — a Notion export, a set of markdown files, a website crawl — and run grokers index on it. The indexer walks the files, extracts links and references, builds the dependency graph. grokers analyze runs overnight: the LLM comprehends each document in the context of what it links to, building summaries, identifying key concepts, surfacing patterns across the corpus.

On day two, they create a Safebot workflow. The workflow has one tool: Streams.related(). They give it a prompt: "When someone asks about topic X, find all documents related to X by traversing Grokers/calls relations up to depth 3, return the top 5 most relevant summaries." They deploy this workflow. Now they have a knowledge base chatbot backed by a pre-computed semantic graph — not keyword search, not ad-hoc LLM document reads, but a graph-based retrieval system built in a weekend.

The ZFS snapshot means they can experiment freely. Snapshot before running the analyzer. Try different comprehension prompts. If the results are wrong, roll back and try again. No production data at risk.

The Weekend Promise

A non-developer can index a knowledge base, analyze it with LLM comprehension, and deploy a Safebot workflow querying the pre-computed graph — all over a weekend, without writing a line of code. The graph persists. The analysis accumulates. Next weekend, they extend it. This is the Safebots value proposition made concrete.

Honest Tradeoffs

Grokers is not universally better than Cursor or Claude Code. It is better for a specific, important class of problem. The honest comparison requires acknowledging where it falls short.

Criterion Cursor / Claude Code Grokers
First-query latency on a new repo Seconds Hours (index + analyze)
Works on arbitrary filesystem structure Yes — reads any file Needs parser support for the language
Handles highly dynamic code (eval, reflection) Partially — model reads and reasons Static analysis misses; clues flag for LLM
Systematic changes across 1000+ files Expensive, incomplete, error-prone Graph query + workspace creation
Understanding unfamiliar large codebase Model reads, forgets next turn Pre-computed summaries, persistent graph
Cross-language dependency tracking Rarely correct across PHP↔JS boundary Unified extern stream for event/endpoint edges
Single-file quick fix Fast, low friction Context from graph makes fix more accurate
Non-code knowledge bases Not designed for this First-class: websites, wikis, APIs, pipelines
Governance and audit trail None — model writes directly Every write is Action.propose + M-of-N approval
Non-developer accessible Requires coding context in prompts Safebots workflow over pre-built graph

Parallel Refactoring via Topological Scheduling

Everything discussed so far has been about understanding code faster. But the pre-computed call graph unlocks something more consequential: changing code in parallel at a speed that is structurally impossible for today's AI tools.

When Cursor or Claude Code performs a refactor — rename a function, change a method signature, propagate a schema change — it works sequentially. It reads the target, proposes a change, reads a caller, proposes a change, reads that caller's caller, and so on. The graph is re-inferred at each step. Each edit is a fresh conversation. The model has no memory of what it already changed.

Grokers makes a different approach possible. The call graph is already materialized as Streams relations. The getTransitiveDependents() query returns the complete set of affected symbols in one round-trip. And the topological order of those symbols — callees before callers, leaves before roots — is already computed by the Kahn sort that ran during indexing.

The topological refactoring algorithm

Given a change to symbol X, the refactoring scheduler does this:

1. Query: getTransitiveDependents(X) → {A, B, C, D, E, F, ...}
2. Sort: topoOrder = kahnSort(dependents) → leaves first, callers last
3. Group: levels = groupByDepth(topoOrder) → [{A,B}, {C,D}, {E}, {F}]

4. For each level in order:
dispatch all symbols in this level in parallel
each worker receives:
• the original change description
• the symbol's source
• comprehended summaries of its already-updated callees (from prev level)
• the proposed changes already written to those callees
each worker proposes: Action.propose(edit for this symbol)

5. All proposals collected → governance review → batch apply

The key is step 3: grouping by depth. Symbols at depth 0 (direct callees of X) have no dependencies on each other — they can all be refactored simultaneously, in parallel workers. Symbols at depth 1 depend only on depth-0 symbols, which are already done. Each level is a wave of parallel work, and each wave has the fully-updated context from the previous wave available before it starts.

For a refactor touching 200 symbols across 6 dependency levels, the sequential approach (read → edit → read → edit...) takes 200 sequential LLM calls. The topological approach takes 6 waves, with the average wave containing ~33 parallel workers. Wall-clock time drops from O(N) to O(depth) — typically O(log N) for real codebases where dependency trees are wide and shallow.

Refactoring scenario Cursor / Claude Code Grokers topological scheduler Speedup
Rename method, 50 callers 50 sequential edits
~5 min, model re-reads each file
1 wave of 50 parallel workers
~15 sec, graph pre-built
~20×
Change function signature, propagate 3 levels deep Sequential per-file, manually tracked
Often misses deep callers entirely
3 waves, each parallel
Complete — no callers missed
Complete vs. partial
Extract interface from class, update 200 implementors Model loses track after ~20 files
Context window exhausted, restarts needed
Topological waves, each worker sees only its level
No context window pressure; each worker is small
Scales where cursor can't
Propagate DB schema change through ORM + handlers + tests Manual tracking across layers
Cross-layer dependencies invisible without graph
Graph includes ORM, handler, and test relations
All three layers in one topological plan
Correct vs. incorrect

Why each worker stays small and accurate

In sequential refactoring, each step accumulates context — the model is asked to remember all previous changes while making the next one. Context windows fill. Earlier changes are forgotten or hallucinated. Consistency degrades as the chain grows longer.

In the topological model, each worker is scoped to a single symbol. It receives only what it needs: the change description, its symbol's source, and the already-updated summaries of its direct callees. The context window stays small regardless of the total size of the refactor. A 1,000-symbol rename uses the same per-worker context as a 10-symbol rename — the difference is in how many workers run in parallel, not in what each one sees.

And because each symbol's stream is updated as workers complete, the Grokers graph stays consistent throughout the refactor. A worker at depth 2 reads the new comprehension of its depth-1 callees — it reasons about the updated API, not the old one. This is the key property: each level's workers start with a truthful picture of the world below them.

The Deeper Point

This is not incremental improvement over sequential refactoring — it is a different computational model. Sequential tools are O(N) in the number of affected symbols. Topological tools are O(depth), which for typical codebases is O(log N). At scale — enterprise monorepos, large knowledge bases, documentation systems — this difference is not a convenience, it is the boundary between feasible and infeasible.

Governance at refactoring scale

Safebox's Action.propose + M-of-N approval model fits naturally here. Each worker in a topological wave produces a proposal, not a direct edit. The full set of proposals from a wave can be reviewed as a coherent batch — "here are all 33 callers that need updating at this level, proposed simultaneously." Reviewers see the full picture before anything is committed. If one proposal is wrong, the wave can be re-run for that symbol without touching the others. ZFS snapshots provide a clean rollback if the entire plan needs to be discarded.

Sequential tools have no equivalent. They write directly, one file at a time, with no coherent batch boundary and no governed proposal phase. The refactor is either all done or partially done, with no clean checkpoint in between.

Conclusion: Grokking Is the Right Architecture

The current generation of AI coding tools — Cursor, Claude Code, Copilot — are impressive demonstrations of what's possible with on-demand inference. They work because developers are willing to pay the cost: slower queries, occasional hallucinations, missed dependencies, re-inference on every turn. For small codebases and one-off edits, that cost is acceptable.

It doesn't scale. A codebase with 50,000 symbols, cross-language event bridges, ORM lifecycle hooks, and dynamically-dispatched configuration reads cannot be reliably navigated by a model that re-infers the graph on each turn. The mistakes compound. The costs multiply. The correctness degrades precisely where it matters most — on the complex, consequential changes that span many files and many layers.

Grokers makes the right trade: pay the comprehension cost once, amortize it across every subsequent query. The pre-computed graph is not a cache of inference results — it is a structured knowledge artifact that grows richer over time, persists across turns, and is queryable by any agent or tool without further LLM cost. The bottom-up analysis order ensures that by the time a complex function is comprehended, all its dependencies are already understood — so the LLM is reasoning about knowledge, not guessing about code.

The generalization is the deeper insight. Grokers is not a code tool that happens to use Streams. It is a knowledge graph materializer for any hyperlinked structure. The same architecture that pre-computes a PHP codebase's call graph also pre-computes a documentation site's link graph, a knowledge base's citation graph, an API schema's dependency graph. Any domain where "what depends on this?" is a meaningful question is a domain where Grokers' approach is the right one.

Integrated with Safebox's sealed execution model and Safebots' workflow automation, Grokers completes the picture: a platform where AI agents can understand complex systems reliably, propose changes governed by human approval, and experiment freely in ZFS-snapshotted isolation — accessible to developers writing CLI commands and non-developers building weekend projects alike.

The graph is the answer. Groking is the right approach.