Jun 26, 2026

10 minutes read

Jun 26, 2026

10 minutes read

Inside cognee 1.0: Memory-Native APIs for Production Agents

Vasilije MarkovicCo-Founder / CEO

The difficult thing with agent memory was never expanding storage for the embeddings — it was getting memory to improve as it's used, so an agent doesn't just collect more context, but becomes less likely to make the same mistake twice.

That's what we've been working on here at cognee. The API, SDKs, retrieval layer, migration path, and Cloud support in cognee 1.0 all have the same purpose: giving agents more usable memory.

We kept hearing the same questions from people moving agents into production: How do I give an agent memory in one line? How do I stop it from making the same mistake? Can I use memory from TypeScript, not just Python? Can I bring the memory I already have?

cognee 1.0 is our answer to all of that: four verbs, a feedback loop, and memory that can run wherever your agents do.

A memory-native API

The API is now built around how agents actually use memory: four verbs, the same in the Python SDK, HTTP API, and over MCP.

import cognee

await cognee.remember("Cognee turns documents into agent memory.")

results = await cognee.recall("What does cognee do?")

await cognee.improve(dataset="main_dataset")

await cognee.forget(dataset="main_dataset")

remember, recall, improve, and forget replace the older add, cognify, and search flow. The lower-level pipeline is still there for those who need it, but most agents can now work through the memory-native interface instead.

Here's how everything fits together.

`remember()`

Hand cognee plain text and it pulls out the entities and relationships, turning a sentence into a connected memory graph. You don't write data into a schema — you just give it words, and it builds something an agent can reason over and links it into what it already knows.

This function stores new information and connects it to the memory graph. That can be a document, a conversation, a correction, a tool result, or a statement you want the agent to keep.

await cognee.remember("The customer prefers quarterly summaries.", session_id="customer_42")

The session_id is used when memory belongs to a specific interaction. If feedback is produced through an agent integration, cognee handles this automatically: the feedback first lands in session memory, then becomes part of permanent memory. If you're adding feedback or corrections yourself, you pass the session ID along with the Q&A entry ID (qa_id) so cognee knows exactly which answer that signal belongs to.

`recall()`

Ask a question in plain language and get an answer grounded in the graph, not guessed. cognee walks the relevant path — cognee → based in → Berlin — and returns "cognee is based in Berlin," traceable to the facts it stored rather than invented.

This function retrieves the most relevant memory for the current question or task. It can use graph structure, retrieval signals, and previous feedback to decide which memories should matter most.

results = await cognee.recall("What does this customer care about?", session_id="customer_42")

Feedback comes after an answer has been judged. Once an agent's answer is confirmed, corrected, or rejected, that judgement becomes a signal attached to the memory used to produce the answer.

Confirmed memories can become stronger. Corrected or misleading ones can be pushed down. Over time, cognee keeps re-weighting memory from use, feedback, importance, and frequency.

`improve()`

Memory that sharpens itself as it's used. cognee combines facts it already holds — Vasilije → cognee and cognee → Berlin — to infer a new connection, Vasilije → Berlin, so answers that would have needed a multi-hop lookup are already there. The connections it adds stay traceable to the facts behind them.

In cognee 1.0, the self-improvement loop runs by default (cognee.remember(conversation, self_improvement=True)).

Still, you can call improve() when you want more granular control: to trigger improvement manually, scope it to a dataset, or make the process explicit inside your own workflow.

await cognee.improve(dataset="main_dataset")

`forget()`

Delete a specific memory and its links cleanly. Tell cognee to forget Vasilije and that node plus its "founded" edge dissolve, while the rest of the graph stays intact — so you can honor removal requests or drop stale facts without rebuilding everything.

This function removes memory that's no longer useful. That can mean clearing a dataset, removing outdated information, or deleting memory tied to a specific scope.

await cognee.forget(dataset="main_dataset")

These four verbs encompass the full memory lifecycle: commit what matters to memory, retrieve it when needed, learn from how it was used, and remove it when it should be gone.

Memory that improves, not just grows

This is the centerpiece of the release.

Most memory tools are built to append — this allows stale information to keep resurfacing even after an agent has been corrected. With cognee 1.0, memory is continuously re-weighted as agents use it.

The improvement loop is fed by three signals:

Feedback applies after an answer has been judged. The feedback is attached to the nodes, edges, and evidence that contributed to the answer, so corrected memories can rank lower and confirmed ones can rank higher.
Importance is set at ingestion, so the runbook outweighs the Slack dump.
Frequency accumulates with use, so the paths an agent actually relies on can reinforce themselves over time.

Retrieval, judgement, feedback, importance, and frequency all feed the loop, so memory gets better rather than just bigger.

The effect is fact reconciliation. If an answer is corrected, that correction should not vanish when the chat ends — it should become part of what future retrieval sees so that the agent eventually stops repeating the mistake.

That's what's necessary for production-ready memory. Everything that follows — the decorator, retrieval layer, SDKs, migration path, and Cloud support — exists to get this memory into your agents and keep it trustworthy.

Drop-in agent memory

The fastest way to put self-improving memory inside an agent is using a single decorator.

It wires memory in end-to-end, retrieving relevant context before your function runs, injecting it into the call, and persisting the trace afterward, so the agent's own history becomes queryable memory. One line, no other changes needed.

from cognee import agent_memory

@agent_memory()
async def my_agent(prompt: str) -> str:
    # cognee injects relevant memory before this runs
    # and persists the trace after
    return await run_llm(prompt)

Retrieval you can trust

Memory is only as good as what it gives back.

cognee 1.0 combines graph traversal, vector search, and BM25 lexical retrieval. Hybrid results are fused with reciprocal rank fusion, and queries can be routed to the retrieval method that fits them best.

Answers now carry evidence references. They cite the chunks they were built from, so an agent can show its work and you can audit why it said what it said. You can check more information here.

TypeScript, not just Python

cognee 1.0 includes a TypeScript SDK with the same core memory verbs.

import { remember, recall } from "cognee";

await remember("cognee turns documents into agent memory.");

const results = await recall("What does cognee do?");

The core verbs are now available across Python, Rust, and TypeScript.

Import your existing memory

Switching tools should not mean abandoning your data. remember accepts provider sources directly, so existing memory can be brought into cognee instead of having to rebuild it from scratch.

from cognee.migration import Mem0Source, ZepSource, LettaSource

await cognee.remember(Mem0Source("mem0_export.json"))

await cognee.remember(ZepSource("zep_export.json"))

await cognee.remember(LettaSource("letta_export.json"))

And you can take it back out at any time, via the open COGX format or standard graph formats.

await cognee.export("main_dataset", format="cogx")  # also graphml, json, cypher

COGX is a versioned, portable memory archive. It's designed to preserve the graph structure, metadata, provenance, and retrieval-relevant state needed to move memory between systems.

No lock-in was our design constraint, not a marketing line.

From local to Cloud in one line

The same code can run against a managed instance. Point the SDK at cognee Cloud and every call routes there.

await cognee.serve(
    url="https://your-instance.cognee.ai",
    api_key="ck_..."
)

You can also push a local dataset to Cloud: this command exports your local graph as a COGX archive and lands it in Cloud with no re-extraction, so you can move from a laptop to production without rebuilding the memory pipeline. Cloud documentation can be found here.

await cognee.push("main_dataset")  # local instance to Cloud

Efficient at scale

We measured the token cost of answering a set of questions against a fixed synthetic corpus two ways: cognee (ingest once, then retrieve), and a full-context GPT-5.5 baseline (resend the whole corpus on every question).

Applying a single blended rate of about $9.84 per million tokens to both, at 50 questions GPT-5.5 runs up to ~$420 and cognee ~$60, with cognee using roughly 85% fewer tokens than GPT-5.5.

Token cost comparison: cognee vs GPT-5.5 full context at scale

What comes next

We think the next step is memory that's not just attached to agents, but actively shaped by how they work: corrections, usage patterns, task context, and domain-specific schemas.

The four verbs are the floor, not the ceiling. remember, recall, forget, and improve give agents a stable surface for memory. From there, the memory layer can become more adaptive, more portable, and more aware of the workflows it supports.

Start locally, connect through MCP, or use a free Cloud key.

pip install cognee
npm i cognee

What shipped in 1.0

Area	Change	PR
API	Memory-native `remember` / `recall` / `forget` / `improve`	#2591, #2596, #2601
API	Typed return shapes for `recall`	#2691
API	Dataset scoping across operations	#2797, #2777
API	`forget` with graph-only mode and flexible identifiers	#2734, #2874
API	Custom graph models to ground memory in your schema	#2916
Self-improvement	Feedback weights on graph nodes and edges	#2266
Self-improvement	Feedback weights influence retrieval scoring	#2427
Self-improvement	Importance-weighted ingestion	#2438, #2447
Self-improvement	Frequency weights via session API	#2746
Agents	`@cognee.agent_memory` decorator and quickstart	#2533, #2678
Agents	`cognee.agents` namespace and CLI with permissioning	#3003
Migration	Import from Mem0, Zep, and Letta; export to portable COGX	#3024
Retrieval	Hybrid retrieval with graph, vector search, and RRF	#3014, #3049
Retrieval	BM25 lexical retrieval	#3010
Retrieval	Question-decomposition retriever variants	#2574
Retrieval	Evidence references in completion answers	#3016
Retrieval	Global context index with graph bucketing	#2848, #2886
Sessions	SQL session cache with SQLite default and Postgres support; no Redis required	#3028
Skills	`GET /v1/skills` list endpoint and publisher metadata	#3071
Storage	Postgres as a full graph backend with multi-user isolation	#2584, #2825
Storage	Ladybug replacing Kuzu as the default embedded graph database	#2755
Ops	One-click deploys for Modal, Railway, Fly, and Render	#2296
Ops	Python 3.10–3.14 support	—

See the numbers for yourself

Get started

Cognee is the fastest way to start building reliable Al agent memory.

Cognee Cloud

Latest

Cognee NewsJun 26, 2026

cognee 1.0: The Open-Source Memory Platform for AI Agents

cognee 1.0 is the first open-source memory platform built around a memory-native API — remember, recall, improve, forget — with full data ownership and deployment flexibility from managed cloud to edge.

Deep DivesJun 26, 2026

cognee on BEAM: SOTA Results Without a Benchmark-Specific Memory System

cognee beat SOTA on BEAM's 100k-token setting by 6.5% and matched SOTA at 10M tokens using only default open-source features — no custom benchmark-specific architecture.

Deep DivesJun 26, 2026

Just Postgres: Drop the Graph Database. Keep the Graph.

cognee 1.0 runs the full agent memory layer — graph, vectors, sessions, and metadata — on a single Postgres instance, eliminating the need for separate graph database, vector store, and Redis deployments.

Cognee NewsJun 26, 2026

cognee 1.0: The Open-Source Memory Platform for AI Agents

Deep DivesJun 26, 2026

cognee on BEAM: SOTA Results Without a Benchmark-Specific Memory System

cognee beat SOTA on BEAM's 100k-token setting by 6.5% and matched SOTA at 10M tokens using only default open-source features — no custom benchmark-specific architecture.

Deep DivesJun 26, 2026

Just Postgres: Drop the Graph Database. Keep the Graph.