📄 Get the paper: Optimizing the Knowledge Graph–LLM Interface🚀 Sign up for Cognee Cloud — now in beta!
📄 Get the research paper
We present a practical approach to optimizing the interface between knowledge graphs and LLMs for complex reasoning. The paper reports systematic hyperparameter optimization using Cognee's modular framework across multiple QA benchmarks, with reproducible settings.
AI Memory Benchmark Results
Understanding how different AI memory systems retain and use context across interactions is crucial for LLM performance.
We updated our benchmark to include a comprehensive evaluation of the Cognee AI memory system alongside LightRAG, Mem0, and Graphiti (previous result).
This page provides a detailed comparison of performance metrics to help developers select the best AI memory solution for their applications.
Key performance metrics
Results for cognee
0.93
Human-like correctness
0.85
DeepEval correctness
0.84
DeepEval F1
0.69
DeepEval EM
Benchmark comparison
Optimized Cognee configurations
Cognee Graph Completion with chain-of-thought (CoT) shows significant performance improvements over the previous non-optimized version:
  • Human-like Correctness: +25% (0.738 → 0.925)
  • DeepEval Correctness: +49% (0.569 → 0.846)
  • DeepEval F1: +314% (0.203 → 0.841)
  • DeepEval EM: +1618% (0.04 → 0.687)
Comprehensive metrics comparison
Dive deeper
What's next?
We're expanding these benchmarks: adding datasets, task types, and stronger correctness/faithfulness metrics. We're also evaluating additional memory systems and publishing reproducible configurations.
Questions or need help optimizing your AI system?
Last updated: August 4th, 2025
Evaluation Results | Cognee