Home< BlogDeep Dives
Jun 26, 2026
10 minutes read

Inside cognee 1.0: Memory-Native APIs for Production Agents

Vasilije Markovic
Vasilije MarkovicCo-Founder / CEO

The difficult thing with agent memory was never expanding storage for the embeddings — it was getting memory to improve as it's used, so an agent doesn't just collect more context, but becomes less likely to make the same mistake twice.

That's what we've been working on here at cognee. The API, SDKs, retrieval layer, migration path, and Cloud support in cognee 1.0 all have the same purpose: giving agents more usable memory.

We kept hearing the same questions from people moving agents into production: How do I give an agent memory in one line? How do I stop it from making the same mistake? Can I use memory from TypeScript, not just Python? Can I bring the memory I already have?

cognee 1.0 is our answer to all of that: four verbs, a feedback loop, and memory that can run wherever your agents do.

A memory-native API

The API is now built around how agents actually use memory: four verbs, the same in the Python SDK, HTTP API, and over MCP.

remember, recall, improve, and forget replace the older add, cognify, and search flow. The lower-level pipeline is still there for those who need it, but most agents can now work through the memory-native interface instead.

Here's how everything fits together.

remember()

remember() verb diagram

Hand cognee plain text and it pulls out the entities and relationships, turning a sentence into a connected memory graph. You don't write data into a schema — you just give it words, and it builds something an agent can reason over and links it into what it already knows.

This function stores new information and connects it to the memory graph. That can be a document, a conversation, a correction, a tool result, or a statement you want the agent to keep.

The session_id is used when memory belongs to a specific interaction. If feedback is produced through an agent integration, cognee handles this automatically: the feedback first lands in session memory, then becomes part of permanent memory. If you're adding feedback or corrections yourself, you pass the session ID along with the Q&A entry ID (qa_id) so cognee knows exactly which answer that signal belongs to.

recall()

recall() verb diagram

Ask a question in plain language and get an answer grounded in the graph, not guessed. cognee walks the relevant path — cognee → based in → Berlin — and returns "cognee is based in Berlin," traceable to the facts it stored rather than invented.

This function retrieves the most relevant memory for the current question or task. It can use graph structure, retrieval signals, and previous feedback to decide which memories should matter most.

Feedback comes after an answer has been judged. Once an agent's answer is confirmed, corrected, or rejected, that judgement becomes a signal attached to the memory used to produce the answer.

Confirmed memories can become stronger. Corrected or misleading ones can be pushed down. Over time, cognee keeps re-weighting memory from use, feedback, importance, and frequency.

improve()

improve() verb diagram

Memory that sharpens itself as it's used. cognee combines facts it already holds — Vasilije → cognee and cognee → Berlin — to infer a new connection, Vasilije → Berlin, so answers that would have needed a multi-hop lookup are already there. The connections it adds stay traceable to the facts behind them.

In cognee 1.0, the self-improvement loop runs by default (cognee.remember(conversation, self_improvement=True)).

Still, you can call improve() when you want more granular control: to trigger improvement manually, scope it to a dataset, or make the process explicit inside your own workflow.

forget()

forget() verb diagram

Delete a specific memory and its links cleanly. Tell cognee to forget Vasilije and that node plus its "founded" edge dissolve, while the rest of the graph stays intact — so you can honor removal requests or drop stale facts without rebuilding everything.

This function removes memory that's no longer useful. That can mean clearing a dataset, removing outdated information, or deleting memory tied to a specific scope.

These four verbs encompass the full memory lifecycle: commit what matters to memory, retrieve it when needed, learn from how it was used, and remove it when it should be gone.

Memory that improves, not just grows

This is the centerpiece of the release.

Most memory tools are built to append — this allows stale information to keep resurfacing even after an agent has been corrected. With cognee 1.0, memory is continuously re-weighted as agents use it.

The improvement loop is fed by three signals:

  • Feedback applies after an answer has been judged. The feedback is attached to the nodes, edges, and evidence that contributed to the answer, so corrected memories can rank lower and confirmed ones can rank higher.
  • Importance is set at ingestion, so the runbook outweighs the Slack dump.
  • Frequency accumulates with use, so the paths an agent actually relies on can reinforce themselves over time.
Retrieval, judgement, feedback, importance, and frequency all feed the loop

Retrieval, judgement, feedback, importance, and frequency all feed the loop, so memory gets better rather than just bigger.

The effect is fact reconciliation. If an answer is corrected, that correction should not vanish when the chat ends — it should become part of what future retrieval sees so that the agent eventually stops repeating the mistake.

That's what's necessary for production-ready memory. Everything that follows — the decorator, retrieval layer, SDKs, migration path, and Cloud support — exists to get this memory into your agents and keep it trustworthy.

Drop-in agent memory

The fastest way to put self-improving memory inside an agent is using a single decorator.

It wires memory in end-to-end, retrieving relevant context before your function runs, injecting it into the call, and persisting the trace afterward, so the agent's own history becomes queryable memory. One line, no other changes needed.

Retrieval you can trust

Memory is only as good as what it gives back.

cognee 1.0 combines graph traversal, vector search, and BM25 lexical retrieval. Hybrid results are fused with reciprocal rank fusion, and queries can be routed to the retrieval method that fits them best.

Answers now carry evidence references. They cite the chunks they were built from, so an agent can show its work and you can audit why it said what it said. You can check more information here.

TypeScript, not just Python

cognee 1.0 includes a TypeScript SDK with the same core memory verbs.

The core verbs are now available across Python, Rust, and TypeScript.

Import your existing memory

Switching tools should not mean abandoning your data. remember accepts provider sources directly, so existing memory can be brought into cognee instead of having to rebuild it from scratch.

And you can take it back out at any time, via the open COGX format or standard graph formats.

COGX is a versioned, portable memory archive. It's designed to preserve the graph structure, metadata, provenance, and retrieval-relevant state needed to move memory between systems.

No lock-in was our design constraint, not a marketing line.

From local to Cloud in one line

The same code can run against a managed instance. Point the SDK at cognee Cloud and every call routes there.

You can also push a local dataset to Cloud: this command exports your local graph as a COGX archive and lands it in Cloud with no re-extraction, so you can move from a laptop to production without rebuilding the memory pipeline. Cloud documentation can be found here.

Efficient at scale

We measured the token cost of answering a set of questions against a fixed synthetic corpus two ways: cognee (ingest once, then retrieve), and a full-context GPT-5.5 baseline (resend the whole corpus on every question).

Applying a single blended rate of about $9.84 per million tokens to both, at 50 questions GPT-5.5 runs up to ~$420 and cognee ~$60, with cognee using roughly 85% fewer tokens than GPT-5.5.

Token cost comparison: cognee vs GPT-5.5 full context at scale

What comes next

We think the next step is memory that's not just attached to agents, but actively shaped by how they work: corrections, usage patterns, task context, and domain-specific schemas.

The four verbs are the floor, not the ceiling. remember, recall, forget, and improve give agents a stable surface for memory. From there, the memory layer can become more adaptive, more portable, and more aware of the workflows it supports.

Start locally, connect through MCP, or use a free Cloud key.

What shipped in 1.0

AreaChangePR
APIMemory-native remember / recall / forget / improve#2591, #2596, #2601
APITyped return shapes for recall#2691
APIDataset scoping across operations#2797, #2777
APIforget with graph-only mode and flexible identifiers#2734, #2874
APICustom graph models to ground memory in your schema#2916
Self-improvementFeedback weights on graph nodes and edges#2266
Self-improvementFeedback weights influence retrieval scoring#2427
Self-improvementImportance-weighted ingestion#2438, #2447
Self-improvementFrequency weights via session API#2746
Agents@cognee.agent_memory decorator and quickstart#2533, #2678
Agentscognee.agents namespace and CLI with permissioning#3003
MigrationImport from Mem0, Zep, and Letta; export to portable COGX#3024
RetrievalHybrid retrieval with graph, vector search, and RRF#3014, #3049
RetrievalBM25 lexical retrieval#3010
RetrievalQuestion-decomposition retriever variants#2574
RetrievalEvidence references in completion answers#3016
RetrievalGlobal context index with graph bucketing#2848, #2886
SessionsSQL session cache with SQLite default and Postgres support; no Redis required#3028
SkillsGET /v1/skills list endpoint and publisher metadata#3071
StoragePostgres as a full graph backend with multi-user isolation#2584, #2825
StorageLadybug replacing Kuzu as the default embedded graph database#2755
OpsOne-click deploys for Modal, Railway, Fly, and Render#2296
OpsPython 3.10–3.14 support

See the numbers for yourself

Get started

Cognee is the fastest way to start building reliable Al agent memory.

Cognee Cloud
Latest
cognee 1.0: The Open-Source Memory Platform for AI Agents
cognee 1.0 is the first open-source memory platform built around a memory-native API — remember, recall, improve, forget — with full data ownership and deployment flexibility from managed cloud to edge.
cognee on BEAM: SOTA Results Without a Benchmark-Specific Memory System
cognee beat SOTA on BEAM's 100k-token setting by 6.5% and matched SOTA at 10M tokens using only default open-source features — no custom benchmark-specific architecture.
Just Postgres: Drop the Graph Database. Keep the Graph.
cognee 1.0 runs the full agent memory layer — graph, vectors, sessions, and metadata — on a single Postgres instance, eliminating the need for separate graph database, vector store, and Redis deployments.