llm memory research

Six deployment shapes for agent memory. Did Supermemory get it right?

11 May 2026 · ~15 min read · by Steven Batchelor-Manning

Contents

The question operators actually answer first
The six shapes the field has settled into
So, did Supermemory get it right?
Why the shape often dominates the paradigm
Three trade-offs the shape forces, every time
The systems that span more than one shape
The honest checklist for picking now
Where this leaves the field

Six deployment shapes for agent memory. Did Supermemory get it right? - hero image.

Most agent-memory comparisons argue about the wrong axis. After 19 systems deep on this, what’s clear is the architecture everyone debates publicly isn’t the thing that decides how a memory system actually feels to use. The deployment shape is. Same architecture in two different shapes is two completely different products.

Six shapes have surfaced across the field. Most teams pick one as primary. The loudest of those six right now is the managed API service, and the loudest exemplar of that shape is Supermemory, which has done a competent job of building exactly what the shape demands. Whether that’s the right shape to have built for is a different question, and one I think the corpus implies an honest answer to.

Let’s look at the six, then at where Supermemory sits in them.

The question operators actually answer first

Most published comparisons of memory systems classify them by paradigm. Is the engine flat vector RAG, knowledge-graph augmented, progressive compression, multi-index hybrid, LLM-as-retriever, trace-as-memory, a Karpathy-style wiki, or filesystem-native? That’s a useful question. The previous piece in this series was dedicated to it.

It’s not the question an operator answers first when picking a memory system. The question they answer first is shaped like this. Do I want to call an HTTPS endpoint, link a library against my agent, point a desktop app at a folder, run a CLI from a script, or install a skill into the agent I already use?

That’s the deployment-shape question, and it’s orthogonal to paradigm. The same paradigm, multi-index hybrid retrieval, has been built and shipped four ways across the 19 systems. As a managed API service (Supermemory at api.supermemory.ai). As a self-hostable open-source server with a managed endpoint on the side (mem9 at api.mem9.ai). As an in-process Go library that doubles as an MCP server (graymatter). As a filesystem-native context store with an MCP front door (OpenContext). Four products. One paradigm. To the operator they feel like four different categories of thing.

That’s the gap I want to dig into. Why deployment shape dominates the daily experience, what the six shapes actually look like, and whether the most prominent bet in the most prominent shape, Supermemory’s bet on the managed API, is the bet operators picking now should be making.

The six shapes the field has settled into

Six deployment shapes have emerged across the 19 systems. Each one forces a particular operational shape, a particular trust boundary, and a particular set of trade-offs the operator has to live with from day one.

Managed API service. A hosted endpoint the agent calls over HTTPS. The vendor runs the storage, the embedding pipeline, the upgrades, the on-call rotation. The operator sets a base URL and an API key. Supermemory and mem9’s hosted endpoint sit here.

In-process library. A package the operator’s code links against. The memory engine runs inside the same OS process as the agent. Storage is local, typically a single embedded database file. SimpleMem and graymatter sit here.

Filesystem-native plus MCP. The user’s actual filesystem is the canonical store. Markdown files on disk are the artefact, a small index is a derivative, and an MCP server fronts the system to whichever agent the user is running. OpenContext, Tolaria, second-brain.

Desktop application. A complete user-facing program that includes the memory layer rather than exposing it as a separate component. The user launches the app, sees a UI, interacts with memory through the app’s chrome. llm-wiki, Tolaria again, Memex.

CLI tool. A command-line program the operator runs from a shell or script. The artefact is a binary that takes arguments, performs work, returns a result. OpenKB, Graphify, GitNexus.

Skill or hook framework. Memory expressed as agent-side artefacts: skill definitions, slash commands, hooks that fire between turns. There’s no separate process. The memory is in the shape of the agent’s behaviour. oh-my-kiro, Understand-Anything.

Six shapes. Most teams pick one as primary. A small number genuinely span more than one, and those are the most interesting cases in the corpus.

So, did Supermemory get it right?

Supermemory is the loudest exemplar of the managed API shape, and the most useful test of whether that shape is a good bet for the operator picking now. The honest answer is shaped: yes for the audience they’re built for, with caveats that the audience either doesn’t care about or hasn’t hit yet.

What Supermemory got right is real. The engineering is genuinely competent. The Cloudflare Workers + Durable Objects + Hyperdrive-PostgreSQL stack is well-chosen for the workload. The TypeScript and Python SDKs plus the integrations with Vercel AI SDK, Mastra, LangChain, LangGraph, OpenAI Agents SDK, Agno, VoltAgent, Cartesia, and Pipecat make Supermemory genuinely drop-in for an agent runtime that already exists. The Memory Router pattern, an OpenAI-compatible reverse proxy that injects memories into prompts and harvests memories from completions transparently, is the slickest integration story in the corpus. The connector ecosystem (Google Drive, Gmail, Notion, OneDrive, GitHub, web crawler) and the multi-modal ingestion (PDF, image OCR, video transcription, AST-aware code chunking) are weeks of work the operator doesn’t have to do. If you’re a team building a consumer product where memory is a feature and you don’t want to operate it, Supermemory is the most credible answer in the corpus. None of that is in dispute.

What gives me pause is structural rather than technical. Three things, none of which Supermemory’s team can fix without re-platforming.

The trust boundary places the user’s most intimate context inside someone else’s perimeter. Memory contains what the user has asked, what they’ve been told, what they care about, and the agent’s accumulated model of them. A managed API places that data inside the vendor’s trust boundary, which is reasonable for a fraction of users and structurally unacceptable for another fraction. For privacy-sensitive deployments, regulated industries, or single-developer power users who refuse to ship their context off the local machine, Supermemory’s shape isn’t a tradeoff to weigh, it’s a non-starter. That’s not a critique of Supermemory’s posture, which is reassuring. It’s a critique of the shape they bet on, which constrains who they can serve.

Costs scale with usage in ways the operator can’t architecturally prevent. Supermemory’s MCP error pathway includes “402, Memory limit reached. Upgrade at supermemory.ai”, and the platform tracks per-organisation document limits with overage billing. For an agent that calls memory on every turn (which is the recommended pattern for an agent that actually uses memory well), the per-call cost compounds quickly. The operator can budget but can’t architecturally cap. An in-process library has no such failure mode, the cost is the cost of a few embedded-database reads and a few cosine distances, in the agent’s own process, billed to nobody.

The engine is opaque. Supermemory’s repo contains a 1,464-line zod-openapi schema documenting the wire contract in considerable detail, versioning, soft-deletion, the relation enum, the static/dynamic profile split. From the schemas you can reverse-engineer most of what the engine does. You can’t reverse-engineer how. The embedding model, the chunking heuristics, the extraction prompts, the reranker, none of those are inspectable, and when the engine misbehaves on the operator’s specific corpus the only recourse is escalating to support and waiting. That’s the price of the genre, not a Supermemory-specific failing, but it’s a price the corpus shows you don’t have to pay if you pick a different shape.

Set against the other 18 systems, this looks less like Supermemory got it wrong and more like Supermemory committed hard to one shape on the matrix and is now constrained by the cell they’re in. If the audience they’re serving is comfortable with the trust boundary and the cost model, the engineering they’ve built on top of those decisions is genuinely good. If the audience they’re serving isn’t, no amount of engineering rescues a shape mismatch. The most honest answer to the title’s question is: Supermemory got their bet right for their audience, and the operator picking now should ask whether they’re in that audience before defaulting to the loudest option in the room.

mem9 is worth holding up as the counter-example. It ships the same paradigm as Supermemory (multi-index hybrid) in the same shape (managed API at api.mem9.ai), and also ships as a self-hostable Go binary the operator can run on their own infrastructure. The operator who wants to start managed and migrate to self-hosted later doesn’t have to re-platform, they change a base URL and a credential. That’s a structurally less constrained bet than Supermemory’s, and it’s the bet I expect the second-generation systems in this corpus to default to.

Why the shape often dominates the paradigm

The argument, plainly. Switching paradigm within a shape is a re-implementation. Switching shape is a re-platform.

If you start with Supermemory (managed API) and want to move to graymatter (in-process library), you’re not just changing your memory engine. You’re changing your data-residency posture, your billing model, your operational on-call shape, your trust boundary, and your dependency graph. These are infrastructure concerns, and infrastructure changes are expensive.

By contrast, swapping a flat-vector-RAG engine for a multi-index-hybrid engine within the same shape is mostly a matter of changing the call site and re-tuning the recall path. Same trust model, same operational shape, same billing posture. Different recipe inside.

The shape constrains the paradigm in practice too. A managed API is structurally suited to multi-index hybrid (the most popular paradigm in this column by a wide margin) and structurally awkward for filesystem-native (which presupposes the user’s filesystem). A skill framework can’t ship a heavyweight engine because it has no engine, the host agent does the work. The shape choice rules out a substantial fraction of the paradigm space before the operator gets to choose paradigm.

And the operational surface is the daily experience. Once a system is deployed, the operator interacts with the shape, not the paradigm. They manage subscriptions and API keys, or embedded database files and library versions, or a vault folder and an MCP configuration, or desktop application updates, or shell scripts and batch jobs, or skill files and hook scripts. Whichever shape they’ve chosen, the daily texture of their work is shaped by it. The paradigm is a property of the engine that mostly matters at recall time. The shape is a property of everything else.

Three trade-offs the shape forces, every time

The trade-offs that come with each shape compound across years. Three are worth naming because they show up in every shape and resolve differently in each.

Trust delegation. Where does the user’s context live? A managed API places that context inside the vendor’s trust boundary. Supermemory’s marketing is reassuring, the encryption posture is reasonable, the Cloudflare-native architecture is competent. None of that is the same as running the engine on hardware you own. For some users this trade is fine, they already trust OpenAI with far more. For others it’s a non-starter. An in-process library or a filesystem-native shape keeps the data on the operator’s machine. A skill framework keeps the data wherever the host agent already keeps it. The trade-off resolves differently per shape, and you can’t move it later without re-platforming.

Operational ownership. A managed API outsources operations to the vendor. An in-process library puts the operator on call. A filesystem-native shape splits the responsibility, the application owns its index, the user owns the vault, and reconciliation between them is the engineering. A desktop application owns its chrome and asks the user to handle the rest. A CLI is operational only when invoked. A skill framework piggybacks on the host agent’s operations. None of these are wrong. They’re different bets about who’s awake at 3am when something breaks.

Distribution shape. A managed API ships through a npm install or a pip install plus an API key. An in-process library ships through the same channels but with no key. A filesystem-native system ships as an installer plus a folder. A desktop application ships as a code-signed platform binary with auto-update. A CLI ships through a package manager. A skill framework ships through the host agent’s plugin protocol. The cost of a release is wildly different across the six. Supermemory pushes a Cloudflare Worker in seconds. Tolaria ships a calendar-versioned YYYY.M.D Tauri build to three operating systems with notarisation. Different shapes, different release cadences, different feedback loops with users.

These three trade-offs aren’t abstract. They’re the texture of working with a memory system day to day. The shape locks them in early, and the paradigm choice rides on top.

The systems that span more than one shape

A handful of systems in the corpus genuinely occupy more than one shape, and they’re the most instructive cases because they show what it actually costs to do so.

mem9 is the corpus’s first system that’s both a fully open self-hostable engine and a managed API endpoint at the same time. The README states the position with unusual precision. Switching between the hosted endpoint and the self-hosted server is “a base-URL and credential change, not a plugin rewrite”. Three architectural properties combine in mem9 that no other system in the corpus exhibits together: a managed externally-callable API, a multi-backend storage abstraction spanning TiDB, PostgreSQL, and a third backend called db9, and auto-provisioning of a fresh database per tenant. Each property is a direct consequence of being both self-hostable and managed. The cost is real, multi-backend abstraction, per-tenant provisioning, spend-limit middleware, control-plane and data-plane separation. The cost is also finite, and once paid the engine spans two cells of the matrix at once.

second-brain spans three shapes in a single Python codebase. It’s filesystem-native (SQLite plus the user’s folder), a desktop runtime (terminal REPL), and a skill framework (a Telegram bot acting as a hosted agent surface). One process, three surfaces, and a SQLite authorizer hook gating per-agent reads at the C layer to make multi-shape multi-agent isolation safe. It’s not the cleanest system in the corpus, but it’s the most ambitious about deployment-shape porosity.

graymatter ships a single ~10MB Go binary that becomes an in-process library, an MCP server, an HTTP server, a CLI, or a TUI dashboard depending on how you invoke it. Same engine, five surfaces. The library API is sixteen public symbols, of which three cover ninety-five percent of use. The whole production surface fits on one screen. This is the cleanest single-binary expression of the deployment-shape question I’ve seen.

These multi-shape systems are still rare. But they’re not anomalies. They’re the leading edge of a porosity that’s been latent in the field since the beginning, and they suggest the binary “managed-or-library” framing is probably the wrong one. A more useful question is, what is the shape of the engine, and which deployment surfaces are exposed?

The honest checklist for picking now

If you’re picking a memory system for a real project today, the corpus implies you should commit to shape before you commit to paradigm. The honest version of the choice is something like this.

Will the user’s context leave the user’s machine? If no, the managed API shape is out. This is a substantial fraction of single-developer power users and most regulated industries, and the constraint is binary, not negotiable.

Does the operator want to take operational responsibility for the engine? If no, the managed API is in and the in-process library and filesystem-native shapes are out, or significantly harder. The team’s capacity to be on call is the cap.

Will the agent call memory on every turn? If yes, the CLI tool shape is out, latency is too high. The in-process library is the strongest candidate, since the recall path is microseconds plus embedding time inside the agent’s own process.

Is the user a non-developer? If yes, the desktop application shape is the only one that fits without re-skilling them. CLIs and MCP configurations are non-starters for non-technical users.

Is the agent the host’s agent (Claude Code, Cursor, Kiro CLI), and does the host expose a hook-and-skill surface? If yes, the skill or hook framework shape offers the lightest deployment, the host does most of the work.

Is the substrate the user’s existing notes folder, and do they expect to keep editing it directly? If yes, the filesystem-native shape is the only one that respects the constraint. Bolting a database on top would break the user’s relationship with their own files.

Once the shape is fixed, the paradigm question becomes manageable. There are still real choices to make inside any column, multi-index hybrid versus knowledge-graph augmented versus Karpathy LLM Wiki, but the choice is bounded by what fits inside the shape you’ve already committed to. The corpus has 19 systems in 15 populated cells of the shape-by-paradigm matrix. The job is to pick the right cell and then pick from the systems in it, not to pick a paradigm in the abstract and discover later that no exemplar in your shape actually implements it.

Where this leaves the field

The corpus’s existing porosity, mem9 spans two cells, second-brain spans three, graymatter blurs the library/CLI/MCP boundaries, suggests the boundaries between shapes are conventions rather than constraints. The systems built next will probably commit less to a single shape and more to a deployment surface that exposes multiple shapes from one engine, the way mem9 ships one Go server that runs both as api.mem9.ai and as make build.

That’s not the canonical shape story this corpus inherited from the field’s first generation, where almost every system was clearly one thing. It’s plausibly the shape story of the second generation, where the engine is the same and the deployment surface is what changes.

For the operator picking now, the practical advice is short. Pick the shape because it fits the constraint that’s hardest to move later, where the data lives, who’s on call, who the user is. Pick the paradigm inside that shape from what’s available there. Be prepared to discover that the most popular paradigm in the abstract isn’t the most popular paradigm in your column, and that’s fine, popularity in the abstract isn’t a constraint your project actually has to satisfy.

The next piece is on write-time investment, the highest-ROI design decision across the 19 systems, and the one move that compounds at every subsequent read. It’s the principle that pulls SimpleMem’s online synthesis, Hindsight’s async consolidation, and LLM-Wiki’s two-step ingest into the same pattern. If you’ve ever wondered why the systems with the cleanest recall behaviour all spend disproportionate compute at write time, that piece is the answer.

Tagged

#memory #llm #agents #architecture #deployment

Share & discuss

Share on X Discuss on X

The X Article covers the same ground in a different form. The site version is the canonical one; the X version exists for the conversation in the replies.