Memory Unlocks the Next Level

Every category of infrastructure goes through the same arc. The first round is internal to each tool, with whatever constraints fit that tool's design. The second round abstracts the capability into a portable layer. The third round is the governance layer: who can read what, who can write what, what is auditable, what happens when the data crosses publisher boundaries. This is the arc that turns a capability into infrastructure other people can build on without having to think about it.

Agent memory is the next category going through this arc. The harness-native round was 2025–2026. The portable retrieval round is what Mem0 announced in June with a clear, well-documented survey of nine implementations and an SDK that abstracts across all of them. The substrate-level round is what this essay covers: typed graph nodes with per-node ACLs, append-only history, forks and workspaces, federation across publishers, reactivity through subscriptions. Each round completes the previous one. Mem0's retrieval layer composes naturally on top of a governed substrate; the substrate makes the retrieval safe to use in shared, multi-party, regulated contexts where the surveyed implementations cannot yet go.

IThe mem0 survey · A field arriving at the same problem

Nine harnesses shipped memory.
The shared challenge points at what comes next.

Mem0's June 2026 survey covers Claude Code, Anthropic Managed Agents, OpenAI Codex, GitHub Copilot, OpenClaw, Hermes, AWS Bedrock AgentCore, Windsurf, and Cognition Devin. Each implementation reflects a careful set of design choices, and each ran into the same structural pattern as the category matured. Reading the survey together makes the next layer obvious. Here are the nine implementations and what each tells us about where the field is going.

Claude Code — @AnthropicAI

Auto-extracted notes under ~/.claude/projects/<repo>/memory/, 200 lines / 25KB cap, four categories, filename-based selection.

What this points to: Selection is by filename, not semantics. A relevantly-named file beats a relevantly-contents file. Past the cap, files drop silently. No embeddings. Team sharing exists behind a flag but underneath is local markdown.

Managed Agents — @AnthropicAI

Append-only event log; memory stores at /mnt/memory/, 8 per workspace, ~100KB each, immutable versioning.

What this points to: Built for multi-agent coordination at workspace scale, not long-term personal memory. The 100KB-per-store ceiling and workspace scoping mean cross-session personal context needs another layer.

OpenAI Codex — @OpenAI

Markdown at ~/.codex/memories/; two-phase write path, grep on read, off by default behind a feature flag.

What this points to: 5,000-token summary truncates silently. Grep is substring-only, so paraphrased facts are invisible. The six-hour idle gate means back-to-back sessions may never consolidate. Local-only. Unavailable at launch in the EEA, UK, Switzerland.

GitHub Copilot

Just-in-time citation verification; memories are structured objects with file-and-line citations; auto-expire after 28 days.

What this points to: The citation schema can't cleanly hold ungroundable or preference-based facts ("prefers minimal abstraction"). Strictly repo-scoped. The only deployed staleness mechanism in the survey with published outcome data — A/B PR merge rate 83%→90% — but only because it ignores the half of memory that doesn't fit a file:line schema.

OpenClaw

Markdown at ~/.openclaw/workspace with per-agent SQLite, embeddings, hybrid retrieval (70% vector / 30% BM25).

What this points to: What survives compaction is whatever the model decides to write to disk in a single "silent internal turn." Long-term memory is selective and inconsistent. The Mem0 plugin removes that dependency — which is largely what drove 247K stars in six months.

Hermes Agent — @NousResearch

Three built-in layers (MEMORY.md ~2,200 chars; USER.md ~1,375 chars; SQLite FTS5 over sessions); plus eight pluggable providers.

What this points to: ~800 tokens of durable memory total. FTS5 is keyword-only — "429 errors" won't match "rate limiting." Local. Provider slot for Mem0 exists because the built-in layers cannot scale.

Bedrock AgentCore — @awscloud

Three async extraction strategies; ~20–40s to extract, ~200ms to retrieve; INVALID markers preserve lineage; LoCoMo 70.58 / PrefEval 79 / PolyBench-QA 83.02.

What this points to: AWS-specific (ecosystem lock-in). Published LoCoMo sits well below leading memory systems. Closest of the surveyed systems to a real memory infrastructure — but tied to one cloud and one tenant model.

Windsurf

Cascade engine writes workspace-scoped files at ~/.codeium/windsurf/memories/; no developer workflow.

What this points to: What's captured is Cascade's call, not the developer's. Workspace-scoped (invisible across projects). Local (no cross-device or team sharing).

Cognition Devin

Knowledge (human-curated trigger-content facts, no auto-capture) + DeepWiki (30 pages, 100 notes, 10K chars each).

What this points to: Approval gate keeps quality high but is friction. Teams that don't review accumulate nothing. Devin-only — knowledge doesn't transfer to other tools.

The shared pattern. Storage is bounded and local. Retrieval is mostly keyword. Memory is harness-scoped, so Claude Code's memory does not yet transfer to Codex. Staleness handling is early. Isolation is at the namespace level.

The survey also names a structural finding: 57–71% cross-user contamination under normal usage (arXiv:2604.01350), with poisoning attacks succeeding at 6–38%. Memory in this generation of implementations is becoming an attack surface. The next layer needs to address this at the data structure, not at the application.

IIThe mem0 layer · What it adds, what it leaves open

Portable retrieval is round two.
Governance is round three.

Mem0's architecture is the right round-two answer to the survey it published. A hybrid store — vector for semantic retrieval, knowledge graph for relational reasoning, key-value for fast metadata. The v3 algorithm (April 2026) moved to single-pass ADD-only extraction, multi-signal retrieval (semantic + BM25 + entity linking) in one pass, entity linking inside the vector store. ~6,900 tokens per query at 1.44s, against ~26,000 tokens and 17.12s for full-context retrieval. Identity-scoped namespaces. Plugins into Claude Code, Codex, Hermes, OpenClaw, AWS Strands. Memory as portable infrastructure, not a per-harness feature. This is genuine progress and the categories above it compose on top of it cleanly.

What the mem0 layer provides

Cross-harness portability via plugins
Semantic retrieval over hybrid vector + graph + KV
Unbounded storage (no 25KB cap, no FTS5 ceiling)
Single-pass extraction with bounded latency
Identity namespaces — one user can't accidentally read another's memory
Drop-in distribution across the major harnesses

What composes on top at the substrate layer

Per-item access control (per-stream, not just per-user)
Audit history — who wrote, who read, when, under what claim
Forking and workspace semantics — try changes without writing through
Federation across publishers — your memory and your team's and your organization's
Stream-level subscriptions — reactive memory, not pull-only
Sealed execution — the host running the memory layer trusted by attestation

Identity namespaces are the floor of governed memory, not the ceiling. A namespace says "user A's memory and user B's memory are different keyspaces." It doesn't say "this specific note within user A's memory is shared with their teammate, that specific note is private, that specific note is part of a community knowledge graph readable by the community's moderators." Granularity at the item level — what every collaborative tool from email to Slack to Google Docs has had for two decades — is missing from every system mem0 surveyed and from mem0 itself.

This matters more for agent memory than for ordinary collaboration tools because agents read more. A coding agent that reads a teammate's private notes about an upcoming layoff and surfaces them in a code review comment is a different kind of incident than a Slack message accidentally sent to the wrong channel. The blast radius of memory leakage in an agent world is proportional to how much memory the agent reads each turn, and the input-to-output ratio for agents is 153:1. They read a lot.

IIIWhat governed graph memory actually is

Per-node access control.
Audit history. Forks. Federation.

The substrate underneath every Safebots and Safebox deployment is Qbix Streams — a typed property graph stored as ordinary MySQL/MariaDB tables, with bidirectional indexed edges, attribute-derived relations via syncRelations, history as append-only message timelines, and access control on every node. The full architectural treatment is in the community post Qbix Streams as a Graph Database; the relevant properties for memory infrastructure are these:

Property	What it means	What mem0 and the surveyed harnesses have
Nodes are typed streams	Every memory item is a `(publisherId, streamName)` pair with attributes, an access list, and an append-only message timeline. Type carries semantics; the substrate knows what kind of thing it is.	Vector store entries, KV entries, knowledge-graph triples. No type discipline.
Edges are first-class	Relations are rows in `streams_related_to` / `streams_related_from`, indexed in both directions. Faceted search, neighborhood traversal, multi-hop joins are native SQL operations.	Vector similarity is the primary edge. Entity linking layer added in mem0 v3. Edges exist but aren't queryable as edges.
Per-node ACLs	Every node carries its own participant list with read/write/admin levels. Four-tier inheritance: public, contact-label, participant role, direct grant. Sharing one memory item is a routine operation.	Identity namespaces (one user's keyspace vs another's). No item-level ACLs. No inheritance.
Append-only history	Every change to a node is a message in its timeline. Replay, audit, rollback are built-in.	Bedrock AgentCore marks changed facts INVALID rather than deleting. The rest of the surveyed systems mutate in place or use file-system timestamps.
Forks and workspaces	`Streams::fork()` snapshots a relation neighborhood at a point in time. The workspace cascade resolves reads transparently. Try experimental changes without writing through to the canonical graph. Discard the fork if it doesn't work.	Anthropic Managed Agents has filesystem-level workspaces (8 per, 100KB each). No memory-level workspaces in the others.
Federation by publisher	Streams belong to a `publisherId` — a community, organization, or person. Publishers host their own data. The schema is global; relations cross publisher boundaries seamlessly.	None. Every surveyed system assumes a single tenant or a single cloud. mem0 has namespaces, not federated publishers.
SQL as the query engine	No custom query language. Faceted search, aggregations with COUNT/HAVING, multi-valued attributes — all standard SQL on indexed tables. The graph is queryable from anything that speaks SQL.	Vector store APIs, mem0's SDK, individual harness APIs. Not portable as queries.

The architectural difference in one line. Mem0 unified memory storage and retrieval across harnesses. Qbix Streams unifies memory storage, retrieval, governance, history, federation, and reactivity — and has done so in production since 2011.

IVWhy per-node access control changes everything

The same words from a different mouth
get a different answer.

The clearest demonstration of what governed memory means at user level is in the Safebots live show interface. The host and the guest can speak the same words into the system. The system responds differently because the speaker's identity has different read access to the underlying memory graph.

Host — authenticated as Robert

"Give me directions to Bob's house"

Stream Places/user/location/home — Bob shared with Robert
ACL evaluates: Robert in participant list, read level granted
Result: map card committed to shared screen
Audience sees a 340 W 57th St directions card

Guest — authenticated as Greg

"Give me directions to Bob's house"

Same stream Places/user/location/home
ACL evaluates: Greg not in participant list
Result: access denied, error visible only to Greg
Shared screen unchanged. No awkwardness. No leakage.

This is what governed memory looks like at the substrate level. The access control runs at the data layer, before anything reaches the model. Bob never gave Robert access to his home address. The system reflects that fact structurally. The model never receives the data, so it cannot leak the data, whether through a prompt injection, a jailbreak, or a routine summarization.

The same property applies up the stack. A coding agent reasoning about a teammate's notes sees only the notes the teammate granted access to. A community moderator agent sees only the messages it has admin level on. A medical-records assistant retrieves records only for the patient currently authenticated. The ACL is the filter that decides what the agent can read in the first place, before any retrieval, before any inference.

Mem0's identity namespaces operate at the user-vs-user level. The item-level, role-level, community-level, and multi-publisher levels are where most organizational deployments live and where the substrate-level layer takes over.

The 57–71% contamination figure is a substrate question. Cross-user memory leakage in the surveyed systems comes from memory layers operating at the namespace level rather than the item level. Per-node ACLs with inheritance move the isolation guarantee from "one user's keyspace versus another's" to "this specific note shared with these specific people, that one private, that one shared with the community moderators only." Same data, different guarantee, granularity at the level where the actual sharing decisions live.

VWhat this looks like for normal people

Memory you can actually share
with the right person.

The argument so far has been technical. The user-facing version is simpler: governed graph memory is what makes AI memory work the way humans already understand memory to work.

When you tell a friend something in person, you have an implicit ACL in mind. Some things are for everyone. Some things are for your work team but not your family. Some things are for your spouse only. Some things are only for the person you said them to. You don't think of these as "memory namespaces" — you think of them as who knows what. Real-life memory is per-item access control with social inheritance.

Every memory system in the mem0 survey collapses this into one of two extremes. Either everything in your memory is yours alone (Claude Code, Codex, Hermes, Windsurf) or your memory is one keyspace within a multi-tenant store (mem0, AgentCore). Neither matches how people actually want to share information with assistants who help them coordinate with other people.

The household scenario

One AI assistant for the household
Parents see kids' homework progress; kids see chores list
Babysitter sees emergency contacts, not parents' calendar
Grandparents see medical schedule, not budget
Every item has the right ACL; the assistant respects all of them

The small-business scenario

One AI assistant for the team of six
Founder sees runway numbers; staff sees task queues
Customer-success rep sees customer notes for their accounts only
Contractor sees scope-of-work; not employee salaries
Same substrate; the ACL decides who reads what each turn

The community scenario

One Safebots community of 500 members
Public posts visible to all; private threads visible to participants
Moderators read flagged content; ordinary members don't
Each member's profile has separate read levels per relation
Federation: this community ↔ neighbor communities

The healthcare scenario

One AI assistant for a clinic
Doctor sees patient charts; receptionist sees appointments only
Patient sees their own records; another patient's are not in the graph for them
Audit trail of every read, for compliance
HIPAA falls out of the substrate, not bolted on top

None of these scenarios are exotic. They are the ordinary shape of human memory sharing. The reason memory-equipped agents don't yet work in any of them is that the surveyed memory layers have no primitive for "this fact is shared with these specific people." The Qbix Streams substrate has had that primitive for fifteen years.

The same property scales up to organizations, down to individuals, and across publisher boundaries. A graph node belongs to its publisher. Its access list is local to it. Anyone the access list grants read level to can see it — whether they are on the same server, in the same community, in a federated community, or across the public internet with the right delegation. There is no "scope" tradeoff because the substrate doesn't have a scope. It has nodes and edges and ACLs.

VIThe comparison ledger

Twelve dimensions.
The full capability picture.

A capability comparison across the dimensions that matter for production deployment. Two columns of memory infrastructure: the harness-native median (Claude Code, Codex, Hermes, OpenClaw, Windsurf, Devin), mem0 as the portable retrieval layer, and Qbix Streams as the substrate.

Dimension	Harness-native	Mem0 layer	Qbix Streams
Storage scale	Bounded (25KB–100KB)	Unbounded	Unbounded
Retrieval	Keyword / grep / filename	Hybrid (vector + BM25 + entity)	SQL + faceted graph + ACL filter
Cross-harness portability	No — harness-scoped	Yes — plugins	Yes — substrate
Per-item access control	No	No (namespaces only)	Yes — read/write/admin per node
ACL inheritance	No	No	Yes — four-tier (public, label, role, direct)
Append-only history	No (mostly)	Partial	Yes — every change is a message
Forks / workspaces	Partial (Anthropic only)	No	Yes — copy-on-write
Federation across publishers	No	No	Yes — built into schema
Subscriptions / reactivity	No (mostly)	No	Yes — every stream is observable
Staleness handling	Auto-expire / TTL	ADD-only with reranking	Fork-and-replace; audit preserved
Contamination resistance	57–71% leakage observed	Namespace isolation	Substrate-enforced per-node
Production years shipping	Months (2026)	~1 year (mem0 v1 2025)	~15 years (since 2011)

Two honest acknowledgments. Mem0 wins on retrieval ergonomics — semantic search with entity linking in one pass is genuinely good, and the plugin distribution into nine harnesses is a real distribution achievement. The Qbix Streams retrieval story today is SQL-fast and ACL-aware but doesn't yet ship a vector index inside the graph. The integration between the two is the natural shape: Qbix Streams as the governance/audit/federation substrate; mem0 as the semantic-retrieval layer on top, scoped per-node by the substrate's ACL.

VIIWhy Safebox · One environment, every memory

One substrate. One auth identity.
Every memory composes.

The Safebox argument for memory is structural. Every other system mem0 surveyed lives inside one harness — Claude Code's memory means nothing to Codex; Codex's means nothing to Hermes; Hermes's means nothing to Windsurf. The mem0 portable layer fixes the storage-and-retrieval part of that by exposing the same API in each harness. It does not fix the environment problem.

The environment problem is that an AI assistant helping a person has work to do across many contexts. The same person — same identity, same authentication, same expectations about who knows what — uses an assistant in a podcast studio (live show interface), in their codebase (Grokers), in their household, in their organization. Each context has its own memory. Most systems force you to maintain a separate memory store per harness, manually copying facts between them.

Safebox is one environment. The same Qbix Streams graph is the substrate underneath:

The show interface

Speaker identity tied to authenticated session
Archive queries against the same graph as everything else
Private coaching written to the host's own stream
Public commits written to the show stream

Grokers (code & docs)

Codebase parsed into the same typed graph
Comprehended summaries are streams with attributes
Cross-language symbol externs are first-class edges
The same ACLs that gate personal memory gate code memory

Safebots workflows

Workflows are trees of streams
Tools propose writes via Action.propose
Capabilities materialize external data into the same graph
Memory of past runs persists exactly like everything else

Realtime collaboration

Voice interfaces over Qbix WebSocket transport
Chat orchestration as a stream of messages
Phone-as-clicker, follower screens, polling
All against the same memory the assistant uses

Because everything is on the same substrate, there is no syncing. The fact you taught the assistant in the podcast studio is available to it in your codebase, scoped by whatever ACL you set when you taught it. The codebase comprehension your team built collaboratively is available to anyone with the right read level, in any of the four contexts above. The audit trail of who knew what when is one graph rather than seven.

The "one environment" property at the substrate level comes from the data structure: your assistant's memory, your team's chat, your show's archive, your code's call graph, and your household's shared calendar are all stored in the same typed graph, with one auth identity governing access across all of it.

The cross-context story. The same memory you query through a Hermes Agent plug-in is the memory the show interface gates by speaker, the memory Grokers writes summaries into, and the memory your household assistant reads with the right ACLs. Mem0 makes that memory portable across harnesses; the substrate makes it composable across contexts, identities, and publishers. One graph. One audit trail. One identity.

VIIIAI as the interface to governed memory

Any collaborative web interface.
Generated. Governed. Real-time.

A property worth naming directly: because Qbix integrates with the same graph database that holds the memory, AI can generate collaborative web interfaces over any subset of the graph — without breaking the ACLs, without leaking through the model, and without requiring custom integration work per interface.

A Safebot workflow can produce: a household dashboard for the family scenario above; a small-business operations view for the team-of-six; a community moderation panel for the 500-member community; a patient-portal view for the clinic. Each is a generated interface against the same underlying graph, scoped by the viewer's identity, served over Qbix's WebSocket transport so the views update reactively whenever the underlying streams change.

The same property extends to voice. Realtime voice interfaces over the graph — speak a query, the substrate filters by speaker identity, the answer respects what the speaker has read access to. This is exactly the architecture the show interface demonstrates at host-and-guest scale; the same pattern works for any voice-enabled context where memory governance matters.

Generated interfaces inherit the substrate's properties. AI can build interfaces over governed graph memory because it is rendering the substrate, not wrapping it. The model never receives data the requester does not have access to. The model never writes data the requester does not have write permission for. Every generated UI for the household scenario, the small business, the community, or the clinic inherits the ACLs from the underlying streams; the developer of the interface does not need to re-implement permissions.

IXIntegration with mem0 and the harness ecosystem

The layers compose into one stack.

None of this is an argument against mem0. The natural architectural shape is the integration: Qbix Streams as the governed substrate; mem0 as the semantic-retrieval layer scoped per-node by the substrate's ACL; the surveyed harnesses (Claude Code, Codex, Hermes, OpenClaw, Bedrock AgentCore, Devin) as the agent runtimes that read and write against the combined system.

The shape works because each layer does what it does well. Mem0's hybrid retrieval is genuinely better than SQL-only graph queries for "find the relevant fact even if the phrasing has drifted." Qbix Streams' ACLs, history, forks, and federation are properties mem0 is not trying to provide. The harnesses' developer-facing UX is what gets agents in front of users in the first place.

// Hypothetical integration sketch — concrete, not committed const Streams = require('qbix-streams'); const Mem0 = require('mem0ai'); // retrieve: substrate filter first, semantic second async function recall(query, userId) { // 1. ACL filter against substrate — what can THIS user see? const visibleStreams = await Streams.query({ user: userId, readLevelGTE: 'view', type: ['memory/note', 'memory/fact', 'memory/preference'] }); // 2. Hand off to mem0 for semantic retrieval, scoped to visible nodes const results = await Mem0.search({ query: query, scope: visibleStreams.map(s => s.toUri()), limit: 10 }); return results; // every result is something the user is permitted to see } // write: substrate decides if the action is allowed at all async function remember(fact, userId) { // Action.propose — the substrate gates before mem0 ever sees it await Streams.actions.propose({ type: 'Memory.write', userId: userId, content: fact, manifest: { storeIn: 'mem0', acl: 'private:userId' } }); // If governance approves, the propose handler writes through to mem0 }

The architectural division is clean: the substrate decides can this user read or write this, mem0 decides which of the things this user can read is most relevant to this query. Mem0's retrieval gets the ACL guarantees for free; the substrate gets semantic retrieval for free.

This is the same composition pattern that makes Safebox useful generally. The substrate defines the rules; specialized layers provide capabilities on top; everything inherits the governance because the substrate is what runs the gates.

XWhat comes next

Memory unlocks the next level.
Governed memory lets everyone use it.

Memory is the infrastructure category the agent field is now building. Every major harness shipped a memory implementation in 2026 because agents without persistent memory hit a hard ceiling fast. The mem0 survey is correct about that. The portable retrieval layer is the right round-two answer and it is shipping into nine harnesses already.

The substrate-level round — governance on the graph — is what extends this from a single-user developer feature into infrastructure households, small businesses, communities, regulated institutions, and federated organizations can build on. Per-item access control, append-only history, forks, federation, reactivity. The implementation has been running since 2011, serving over seven million users across a hundred countries, underneath the Safebots show interface, the Grokers code-comprehension layer, and the broader Qbix platform.

The 57–71% contamination figure, the 200K-line CLAUDE.md limit, the 5,000-token Codex truncation, the 800-token Hermes ceiling — these are points along the trajectory the category is moving through, and they get resolved at the substrate layer where access, history, and federation live together. Better retrieval (mem0) and bigger storage (every harness expanding their cap) help, and they compose cleanly with a governed graph underneath.

2011

Year the substrate started shipping

7M+

Users across 100+ countries on Groups, the production deployment

∀

Memory item has its own ACL, audit trail, and federation surface

The unit of memory worth building infrastructure around is the typed graph node with per-node access control. Mem0 is the right second-round move. The substrate that composes underneath it has been ready for fifteen years.

Memory unlocks the next level.
Governed memory is what makes it work at scale.

Nine harnesses shipped memory.
The shared challenge points at what comes next.

Portable retrieval is round two.
Governance is round three.

What the mem0 layer provides

What composes on top at the substrate layer

Per-node access control.
Audit history. Forks. Federation.

The same words from a different mouth
get a different answer.

Host — authenticated as Robert

Guest — authenticated as Greg

Memory you can actually share
with the right person.

The household scenario

The small-business scenario

The community scenario

The healthcare scenario

Twelve dimensions.
The full capability picture.

One substrate. One auth identity.
Every memory composes.

The show interface

Grokers (code & docs)

Safebots workflows

Realtime collaboration

Any collaborative web interface.
Generated. Governed. Real-time.

The layers compose into one stack.

Memory unlocks the next level.
Governed memory lets everyone use it.

Related reading

Memory unlocks the next level.Governed memory is what makes it work at scale.

Nine harnesses shipped memory.The shared challenge points at what comes next.

Portable retrieval is round two.Governance is round three.

What the mem0 layer provides

What composes on top at the substrate layer

Per-node access control.Audit history. Forks. Federation.

The same words from a different mouthget a different answer.

Host — authenticated as Robert

Guest — authenticated as Greg

Memory you can actually sharewith the right person.

The household scenario

The small-business scenario

The community scenario

The healthcare scenario

Twelve dimensions.The full capability picture.

One substrate. One auth identity.Every memory composes.

The show interface

Grokers (code & docs)

Safebots workflows

Realtime collaboration

Any collaborative web interface.Generated. Governed. Real-time.

The layers compose into one stack.

Memory unlocks the next level.Governed memory lets everyone use it.

Related reading

Memory unlocks the next level.
Governed memory is what makes it work at scale.

Nine harnesses shipped memory.
The shared challenge points at what comes next.

Portable retrieval is round two.
Governance is round three.

Per-node access control.
Audit history. Forks. Federation.

The same words from a different mouth
get a different answer.

Memory you can actually share
with the right person.

Twelve dimensions.
The full capability picture.

One substrate. One auth identity.
Every memory composes.

Any collaborative web interface.
Generated. Governed. Real-time.

Memory unlocks the next level.
Governed memory lets everyone use it.