Dec 23, 2025

15 minutes read

Dec 23, 2025

15 minutes read

Claude Agent SDK × cognee: Persistent Memory via MCP (Without Prompt Bloat)

Hande KafkasGrowth Engineer

🧠 TL;DR: If you’re building with the Claude Agent SDK, your agents already call tools via MCP, but—this package adds persistent memory by exposing “add” and “search” as MCP tools, so your agent can save knowledge once and pull it back later (even in a fresh session). No need to constantly re-send background context, and this also works with Claude Code’s MCP setup, so you can trial the workflow in the terminal before wiring it into a full agent app.

If you want the full cognee MCP toolkit, you can point the Agent SDK at the existing cognee MCP server instead. This package is ideal when you want the simplest integration.

Here's a frustration developers building with LLMs are far too familiar with: you spend an hour explaining your codebase architecture, debugging a tricky issue, or walking through nuanced business requirements—and then the session ends.

Next run, it’s back to cold-start behavior—the agent starts from scratch, asking the same questions, making the same suggestions you already ruled out yesterday. So you either:

Paste yesterday’s context again,
Accept worse answers, or
Build a pile of ad-hoc “memory prompts” that slowly turns into an unmaintainable blob.

This isn’t a model problem—it’s an application architecture problem. LLM calls are stateless: even though they are brilliant pattern matchers, LLMs still live in an eternal present where every session starts from zero. And patches like “just use a bigger context window, bro” can help in the moment, but they can’t solve the core issue: recall that’s transient by design.

Anthropic has been pushing hard on a fix. Their Model Context Protocol (MCP) is a fundamental rethink of how AI systems interact with external tools and data sources, and their Claude Agent SDK provides the scaffolding to turn it into real agent workflows.

But the SDK doesn’t ship with memory out of the box—something your agent can write to and query across sessions, without you having to re-send the whole backstory. That’s why we’ve made it possible to plug cognee into it over MCP as a dedicated graph + vector memory layer built from your data, so Claude can save what matters and pull it back exactly when it needs it.

This post covers how the integration works: hooking cognee into the SDK, storing knowledge, then proving it sticks by querying it in a clean, new session.

Wait, Doesn't Claude Already Have Memory?

If you use Claude in Anthropic’s apps, you can search prior chats and enable memory features depending on your plan and settings. Anthropic describes chat search as RAG-based, and it’s scoped (for example, searches can be limited to a given project).

Manthan Gupta's reverse-engineering analysis reveals how it works: Claude uses on-demand tools like conversation_search and recent_chats for selective retrieval, plus a memory_user_edits tool for explicit "remember this" commands. User facts get injected as XML into the prompt.

This is fundamentally different from ChatGPT's approach of pre-computing summaries—it’s clever and definitely useful for something basic like an interactive chat. But—when you’re building your own agent app with the Claude Agent SDK, its limitations become obvious:

It's tied to claude.ai: The SDK for building your own agents doesn't include these memory tools.
It's conversation-centric: Claude's memory stores user facts and conversation snippets, not structured domain knowledge.
It's not queryable by structure: You can't ask a query that requires traversing relationships like "show me all entities related to healthcare."

So, while the SDK doesn’t ship with a domain memory layer for your app, it does give you a clean way to plug one in via MCP.

cognee is it: our engine provides persistent, shared application memory you control. It's the difference between the default “personal assistant” memory and a durable, structured knowledge system your whole enterprise can be built on.

The Hidden Headache of AI Without Memory

Here's a truth most AI product demos gloss over: context windows are not memory.

Yes, Claude can handle 200K tokens. That's impressive. But although large context windows are helpful, stuffing everything into a giant prompt creates its own problems:

Costs pile up: re-sending background knowledge increases token usage and cost.
Latency creeps in: bigger prompts tend to be slower.
Recall is a crapshoot: long-context performance can degrade depending on where the relevant detail appears (the "lost in the middle" phenomenon).
Nothing persists: when the session ends, you don’t have an indexed knowledge base—just yesterday’s transcript.

Real applications instead need:

Memory that outlives a session,
Retrieval that’s selective (no prompt bloat), and
Queries that can use structure, not just similarity.

MCP as the Connector; Agent SDK as the Runtime

Anthropic describes Model Context Protocol (MCP) as an open standard for connecting assistants to the systems where data lives—repos, tools, databases, internal services.

In the Claude Agent SDK (Python), you can:

Define tools (including MCP tools),
Build an in-process MCP server with create_sdk_mcp_server(),
register it via ClaudeAgentOptions.mcp_servers,
and allowlist tool names like mcp__calc__add.

That’s the hook you need to make “memory” a tool the agent can call intentionally.

cognee as the Agent’s Structured, Persistent Memory

Here at cognee, we approach AI memory as more than "stuff we paste into prompts." Our engine processes the ingested data and builds an actual knowledge representation from it—a graph database paired with vector embeddings—that your AI can query on demand.

The model is simple and dev-friendly:

.add: ingests data asynchronously and preps it for processing.
.cognify: chunks, extracts entities/relations, and builds a knowledge graph with embeddings. All stored data survives process restarts.
.search: queries using vector similarity + graph traversal.

When your AI needs to remember something, it isn’t skimming a flat pile of chat logs. It’s querying a structured store: vector search to find the right neighborhood of meaning, then graph traversal to follow the actual links—who’s involved, what they signed, when it ends, and what it’s connected to. That lets you ask questions like “show me everything healthcare-related” and then narrow to “only the contracts tied to healthcare companies.”

That difference matters because useful knowledge is rarely a single snippet of text. It’s a set of relationships you want to reuse and recombine. cognee preserves those relationships, so recall isn’t just “similar text,” it’s “the right facts, in the right shape.”

How the Integration Works

So, really, what this integration does is just give Claude a “memory toolbox” it can reach for when it needs to store or retrieve information. The setup requires a tiny bit of wiring:

Expose cognee-backed tools as MCP tools
Register them in the Agent SDK as an MCP server
Allowlist tool names so the agent can call them

The Agent SDK’s create_sdk_mcp_server() creates an in-process MCP server you can mount into ClaudeAgentOptions.

Here’s the shape:

from claude_agent_sdkimport (
    create_sdk_mcp_server,
    ClaudeAgentOptions,
    ClaudeSDKClient,
)

from cognee_integration_claude
import add_tool, search_tool

# 1) Create an MCP server with cognee tools
server = create_sdk_mcp_server(
    name="memory-tools",
    version="1.0.0",
    tools=[add_tool, search_tool],
)

# 2) Configure your agent + 3) allowlist the tool 
options = ClaudeAgentOptions(
    mcp_servers={"tools": server},
    allowed_tools=["mcp__tools__add_tool", "mcp__tools__search_tool"],
)

That's it. Your agent now has persistent memory—it decides when to store information in it and when to retrieve it based on the conversation context.

Testing, Testing: Building a Contract Tracker

Let’s do a simple test run by building an AI assistant that helps track business contracts. We’ll store a few contracts in session 1, then query in session 2 with no chat history carried over.

Session 1: Onboarding Information

async with ClaudeSDKClient(options=options) as client:
    await client.query("""
        Remember these contracts:
        - Meditech Solutions: Healthcare, £1.2M, Jan 2023 - Dec 2025
        - QuantumSoft: Technology, £5.5M, Aug 2024 - Aug 2028
        - Orion Retail Group: Retail, £850K, Mar 2024 - Mar 2026
    """)

    async for msg in client.receive_response():
        # Claude acknowledges storing the information
        print(msg)

At this point, the “memory” is not your conversation transcript. The data has been pushed into cognee, which extracts entities (company names, industries, dates, values), builds relationships between them, and stores everything persistently.

Session 2: A Week Later, Different Instance

We run a completely new agent instance—no shared state, no conversation history.

# Fresh agent, fresh session
async with ClaudeSDKClient(options=options) as client:
    await client.query(
        "What healthcare contracts do we have?"
    )

    async for msg in client.receive_response():
        print(msg)

Because the agent can call search_tool, it can retrieve:

Meditech Solutions

Industry: Healthcare

Value: £1.2M

Duration: January 2023 to December 2025

without you re-sending the original list.

Isolating Memories Across Users

If you’re building anything multi-user, memory needs to be treated like user data, defaulting to isolation.

cognee MCP is designed for multi-tenant use by multiple MCP-capable clients, and it explicitly supports “shared vs isolated” architecture modes (standalone vs API mode).

If you’re sessionizing tools at the SDK layer, the pattern is:

from cognee_integration_claude import get_sessionized_cognee_tools

# Alice gets her own isolated memory
alice_add, alice_search = get_sessionized_cognee_tools("alice-session")

# Bob gets his own isolated memory
bob_add, bob_search = get_sessionized_cognee_tools("bob-session")

# Configure separate agents
alice_server = create_sdk_mcp_server(
    name="alice-tools",
    version="1.0.0",
    tools=[alice_add, alice_search]
)

bob_server = create_sdk_mcp_server(
    name="bob-tools",
    version="1.0.0",
    tools=[bob_add, bob_search]
)

Practical checklist:

Use stable IDs (user ID / org ID) for scopes.
Keep dev/staging/prod memory fully separated.
Decide upfront what you store: facts, derived summaries, or both.

Vectors Are Fine—Until You Need Relationships

Standard Retrieval-Augmented Generation (RAG) systems use vector similarity to find relevant text chunks.

That works for queries like ”find me documents about X” but it struggles when the question is relationship-shaped:

“Which contracts end in Q1?”
“Show deals connected to healthcare vendors.”
“List all entities related to X, then filter by date.”

cognee’s design explicitly supports:

Vector search for semantic similarity,
Graph traversal for structure, and
Hybrid queries that use both.

This makes the difference between retrieving text that sounds related and retrieving information that’s related by explicit links (company → industry → contract → dates → value).

Async Design for Smooth Scaling

This integration is designed to stay predictable in production workloads that require lots of writes and reads in parallel—without you having to micromanage ordering or consistency. It features:

Queue-based ingestion (writes are serialized): If multiple add_tool calls happen close together, they’re queued so writes don’t collide. Items can be batched, then processed before cognify() builds or updates the graph.

Non-blocking searches: search_tool won’t race ahead of in-flight writes. If there’s pending ingestion work, searches wait until it’s safe, so results reflect the latest committed state rather than a half-updated snapshot.

Automatic graph construction: Once content is ingested, cognee handles the heavy lifting—entity extraction, relationship building, and embeddings—so you’re not wiring your own pipeline.

MCP-native tools: Because the surface area is MCP tools, the same “memory toolbox” can be reused anywhere MCP is supported, not only inside the Claude Agent SDK.

Copy/Paste Start: Install → Store → Query

Install the integration

pip install cognee-integration-claude

Or with uv:

uv add cognee-integration-claude

Configure your LLM key for cognee

Create a .env file:

LLM_API_KEY=your-openai-api-key

Check cognee's documentation for using providers other than OpenAI.

Note: The Claude Agent SDK uses a bundled CLI that handles authentication automatically. If you're running in an environment where you're already authenticated with Claude (like Cursor), you may not need additional API key configuration.

Minimal example (store → retrieve)

import asyncio
from claude_agent_sdk import (
    create_sdk_mcp_server,
    ClaudeAgentOptions,
    ClaudeSDKClient,
    AssistantMessage,
    TextBlock,
)
from cognee_integration_claude import add_tool, search_tool

async def main():
    server = create_sdk_mcp_server(
        name="cognee-tools",
        version="1.0.0",
        tools=[add_tool, search_tool]
    )

    options = ClaudeAgentOptions(
        mcp_servers={"tools": server},
        allowed_tools=["mcp__tools__add_tool", "mcp__tools__search_tool"],
    )

    # Store something
    async with ClaudeSDKClient(options=options) as client:
        await client.query("Remember: Project Alpha deadline is January 15th")
        async for msg in client.receive_response():
            if isinstance(msg, AssistantMessage):
                for block in msg.content:
                    if isinstance(block, TextBlock):
                        print(f"Claude: {block.text}")

    # Later, retrieve it
    async with ClaudeSDKClient(options=options) as client:
        await client.query("When is Project Alpha due?")
        async for msg in client.receive_response():
            if isinstance(msg, AssistantMessage):
                for block in msg.content:
                    if isinstance(block, TextBlock):
                        print(f"Claude: {block.text}")

asyncio.run(main())

A Few Blueprints for Memory-Powered Agents

Pre-loading Background Knowledge

Sometimes you don’t want your agent to “learn” everything through chat. You want a baseline memory—policies, product docs, historical records—ready from the first prompt.

A simple approach is to preload once at startup (or as a separate ingestion job), then let every agent query it:

import cognee

async def preload_knowledge():
    # Load your documents (PDFs, text files, database exports, etc.)
    documents = [
        "Company policy document content...",
        "Product catalog information...",
        "Historical transaction records...",
    ]

    for doc in documents:
        await cognee.add(doc)
    await cognee.cognify()

# Run at application startup—agents can now search this knowledge

A few practical tips that make this pattern behave well:

Treat ingestion as a pipeline step, not a request-time side effect (it keeps latency stable).
Add in batches and cognify once (cheaper and easier to reason about than cognifying after every document).
Be intentional about what goes in: stable “reference” knowledge belongs here; volatile scratch notes usually don’t.

Specialized Agent Teams

Following the patterns we've seen in LangGraph and Google ADK integrations, one of the cleanest ways to scale is to split responsibilities: one agent collects and stores, another reads and synthesizes. Same memory store, different permissions.

# Collector agent: only stores
collector_server = create_sdk_mcp_server(
    name="collector",
    version="1.0.0",
    tools=[add_tool]  # Can only add
)

# Analyst agent: only queries
analyst_server = create_sdk_mcp_server(
    name="analyst",
    version="1.0.0",
    tools=[search_tool]  # Can only search
)

This separation buys you a lot:

Less accidental pollution: the “analyst” can’t write half-baked conclusions into memory.
Clear auditability: one place to check what’s being stored and why.
Safer multi-agent workflows: the collector can be noisy; the analyst stays focused.

In practice: collectors pull from sources (tickets, emails, logs, docs). Analysts query the graph, aggregate, and generate outputs—without needing write access.

Visualizing What Your AI Knows

When answers look “almost right,” the fastest way to debug is to inspect the memory structure itself. cognee can render an interactive graph so you can see what got extracted and how it’s connected:

import asyncio
import cognee

async def visualize():
    await cognee.visualize_graph("knowledge_graph.html")

asyncio.run(visualize())
# Opens an interactive graph visualization in your browser

This is especially useful for catching issues like:

entities that were merged incorrectly (two companies treated as one),
missing relationships (a contract node exists but isn’t linked to an industry),
or overly broad concepts that are swallowing everything.

Seeing the graph turns “why did it answer that?” from a guessing game into something you can actually reason about.

Agents Don’t Need Bigger Prompts; They Need Memory

Anthropic’s push toward MCP is a pretty loud signal of where things are going: agents that don’t just call APIs, but operate inside a real working environment—tools, data, and other agents all in the loop.

In that world, memory isn’t a nice-to-have; it’s the difference between a system that demos well and a system that scales and improves over time. Without persistent memory, every run starts from scratch: the same context gets re-sent, the same decisions get re-litigated, and the same mistakes get repeated—because nothing actually sticks.

That’s the reason behind this integration. cognee turns “memory” into something concrete inside the Claude Agent SDK: a first-class MCP tool Claude can write to and query on demand, with hardwired structure (graph) and meaning (vectors).

If you’re building support agents that need to carry conversations forward to actually resolve issues, research agents whose value comes from stacking domain knowledge, or workflow automation you’d rather have sharpen with every run than spin in the same loop, persistent memory is the upgrade that makes the whole system feel real—and, more importantly, real-iable.

Cognee is the fastest way to start building reliable Al agent memory.

Latest

IntegrationsFeb 6, 2026