We raised $7.5M seed led by Pebblebed! Learn more.
< backFundamentals
Feb 24, 2026
8 minutes read

How Cognee Builds AI Memory

Vasilije Markovic
Vasilije MarkovicCo-Founder / CEO

I started cognee three years ago because I needed personalized, persistent memory for users in a B2C application. Existing tooling around vector databases required manual management of collections, embeddings, updates, and deletions. Unfortunately, there was no reliable abstraction for structured memory, no support for long-term evolution, and no principled way to connect semantic retrieval with reasoning.

Three years later, cognee has a production-grade Python SDK running over one million pipelines each month, adopted by more than 70 companies including Bayer, University of Wyoming, Dilbloom, and dltHub. The scope expanded well beyond the original idea of "better RAG." What emerged instead is a knowledge engine that treats memory as a first-class systems problem.

Our vision has been shared by Pebblebed, 42CAP, Vermilion Cliffs Ventures and angels from Google DeepMind, n8n, and Snowplow in our latest $7.5M seed round.

As part of our major announcement, I wanted to reiterate what cognee is all about. Let’s see what cognee excels at, and what we plan on doing in the next year.

The Core Pipeline

How does AI memory work

Cognee's API has four operations. Everything is async.

add ingests data. It takes files, directories, raw text, URLs, or S3 URIs. It supports 38+ formats (PDF, CSV, JSON, audio, images, code). Content is normalized to plain text, hashed for deduplication, and organized into datasets with ownership and permissions.

cognify builds the knowledge graph. This is the core operation. It runs a six-stage pipeline: classify documents, check permissions, extract chunks, use an LLM to extract entities and relationships, generate summaries, then embed everything into the vector store and commit edges to the graph. Only new or updated files are processed on re-runs.

memify refines the graph after ingestion. It prunes stale nodes, strengthens frequent connections, reweights edges based on usage signals, and adds derived facts. This is where cognee's self-improvement happens: memory is not static storage, it's an evolving structure that adapts based on feedback and interaction traces.

search queries across both vector and graph layers. Cognee ships 14 retrieval modes, from classic RAG to chain-of-thought graph traversal. More on that below.

Architecture: Three Stores, One Engine

Cognee is a graph-vector hybrid. It unifies three storage layers into a single memory engine:

  • Graph store for entities, relationships, and structural traversal. Default: Kuzu. Also supports Neo4j, FalkorDB, Amazon Neptune, Memgraph.
  • Vector store for embeddings and semantic similarity. Default: LanceDB. Also supports Qdrant, pgvector, Redis, DuckDB, Pinecone, ChromaDB.
  • Relational store for documents, chunks, and provenance tracking. Default: SQLite. Also supports PostgreSQL.

The defaults are all file-based (SQLite + LanceDB + Kuzu), so there's zero infrastructure to set up. pip install cognee and an OpenAI key gets you running.

The fundamental unit is the DataPoint, a Pydantic model that carries content and metadata. You can define custom DataPoints to control which fields get embedded:

Entities, chunks, summaries, and relationships are all DataPoints. The graph and vector stores stay linked: every node in the graph has a corresponding embedding, so you can move between semantic similarity and relational traversal without losing coherence.

Session and Permanent Memory

Cognee separates memory into two layers. Session memory operates as short-term working memory for agents. It loads relevant embeddings and graph fragments into runtime context for fast reasoning. Permanent memory stores long-term knowledge artifacts: user data, interaction traces, external documents, and derived relationships. These artifacts are continuously cross-connected inside the graph while remaining linked to their vector representations.

In practice, this means you get conversational context that persists:

14 Search Modes

Not every query needs the same retrieval strategy. Cognee gives you 14 modes:

ModeWhat it does
GRAPH_COMPLETIONGraph-aware Q&A: vector hints find relevant triplets, LLM answers grounded in graph structure
RAG_COMPLETIONClassic retrieve-then-generate over text chunks
GRAPH_COMPLETION_COTChain-of-thought reasoning over multi-hop graph traversals
GRAPH_COMPLETION_CONTEXT_EXTENSIONIterative context expansion for open-ended queries
GRAPH_SUMMARY_COMPLETIONUses pre-computed summaries combined with graph context
TRIPLET_COMPLETIONTriplet-based (subject-predicate-object) retrieval with LLM completion
NATURAL_LANGUAGETranslates natural language to Cypher, executes against the graph
CYPHERRun Cypher queries directly
CHUNKSRaw passage retrieval via vector similarity
CHUNKS_LEXICALToken-based lexical chunk search (Jaccard similarity)
SUMMARIESSearch over precomputed summaries
TEMPORALTime-aware graph search with temporal entity extraction
CODING_RULESCode-focused retrieval from indexed codebases with rule associations
FEELING_LUCKYLLM auto-selects the best mode for your query

The default, GRAPH_COMPLETION, is where cognee differs most from vanilla RAG. Instead of returning the top-k chunks by cosine similarity, it uses vector search as a hint to find relevant graph triplets, then traverses the graph to build structured context before generating an answer.

Multi-Tenancy and Isolation

Your apps and agents need logical division. This is why memory graphs can be instantiated per user, per group, or as shared public graphs. This is not just namespace separation at the vector level. Isolation happens at the graph and trace level, with dataset-level permissions (read, write, delete, share):

Multi-tenancy is supported across pgvector, Neo4j, Kuzu, and LanceDB.

Agent Framework Integrations

Cognee plugs into the agent frameworks you're already using. Each integration exposes add_tool and search_tool that you hand to your agent:

LangGraph:

OpenAI Agents SDK:

Claude Agent SDK (via MCP):

There's also a standalone MCP server for Cursor, Claude Desktop, and Cline.

Benchmarks

We benchmarked cognee against Mem0, Graphiti, and LightRAG on 24 HotPotQA multi-hop questions, 45 repeated runs on Modal Cloud. All evaluation code is open source.

MetricCogneeWith CoT
Human-like correctness0.93+25%
DeepEval correctness0.85+49%
DeepEval F10.84+314%
DeepEval EM0.69+1618%

Base RAG scores 0.4 on the same correctness metric. The biggest gains come from chain-of-thought graph traversal, where multi-hop reasoning over explicit relationships outperforms flat retrieval.

What Comes Next

Cognee has moved from being an abstraction layer over vector databases to becoming a system for structured, persistent, and adaptive memory. The focus is no longer just retrieval quality, but the engineering of context itself: how memory is represented, isolated, evolved, and made computationally usable for reasoning agents.

With the seed round, we're investing in three research directions.

Adaptive retrieval via trace optimization. We're moving from fixed graph traversal strategies to adaptive retrieval based on the concept of a retrieval trace, the ordered sequence of nodes visited during query resolution. Instead of applying a single traversal heuristic to all tasks, the system will learn task-dependent traversal policies. For constraint-satisfaction queries (e.g., logistics feasibility), the optimal trace should prioritize constraint nodes early. For explanatory queries (e.g., root-cause analysis), the trace should encourage broader exploration and multi-branch evidence aggregation. The objective is to learn traversal strategies that optimize performance for specific use-case classes rather than relying on static graph-walking logic.

Learning and inference-time optimization of graph traversals. Two complementary mechanisms. First, reinforcement learning will iteratively optimize traversal policies based on performance feedback, using correctness signals from labeled query-answer pairs to improve trace selection over time. Second, inference-time optimization will dynamically evaluate multiple candidate traces and select the most promising ones based on graph-derived metrics such as node centrality, structural diversity, or coverage. Together, these approaches introduce both offline policy learning and online trace selection to improve retrieval quality and efficiency.

Neuroscience-inspired embedding design. We want to do more with embeddings. The algorithmic design will draw inspiration from cognitive neuroscience models of memory retrieval and reinforcement learning frameworks for planning and rollout-based inference, grounding the system in both empirical evaluation and biologically informed retrieval dynamics.

On the product side: cloud platform, a Rust engine for on-device memory, multi-database support, user database isolation, and 30+ new data source connectors shipping in Q1 and Q2.

Get Started

You don’t need any infrastructure to see how cognee works locally. The defaults (SQLite + LanceDB + Kuzu) run embedded and with minimal resource expenditure. You can swap in Neo4j, Neptune, Qdrant, or pgvector when you're ready to scale.

Cognee Cloud: https://platform.cognee.ai/

GitHub: github.com/topoteretes/cognee (12,000+ stars, Apache 2.0)

Docs: docs.cognee.ai

Research: Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning

Cognee is the fastest way to start building reliable Al agent memory.

Latest