Vectors + Graphs in Practice: Field Notes from the cognee Backend
Here at cognee, we run a vector store and a graph database side-by-side. Thatâs not redundancyâitâs by design.
Vectors deliver high-recall semantic candidates (what feels similar), while graphs provide the structure to trace relationships across entities and time (how things relate). Together, they let us cast a wide net quickly and then explain how an answer was assembled, with clear paths and provenance.
Below, we recap the strengths of each, show how they pair in GraphRAG-style pipelines, and share practical notes from our engineers who have shipped with both vector stores and graph databases.
Two Tools, Two Superpowers
Vector databases store high-dimensional embeddings and return nearest neighbors in that space, so âprivacy policy updateâ can match âchanges to data handlingâ even if the wording doesnât overlap. At scale, you donât compare against everythingâyou use an approximate index. HNSW remains a go-to choice because it delivers strong recall with sub-linear lookups.
Graph databases store entities and their relationships, along with their properties. That layout makes multi-hop questions feel natural: âsupplier â ships_to â plant â produces â productâ. Fraud rings, recommendations, and supply chains are classic graph problems where traversals beat JOIN storms.
The typical cognee flow is hybrid. Vectors cast a semantic net to collect promising snippets; the graph ties those snippets to entities, edges, and constraints, giving us explainable reasoning paths and clean provenance.
How They Pair in GraphRAG
Running vectors and graphs in parallel only pays off when theyâre woven into the same retrieval loop. In GraphRAG, the two layers complement each other: vectors pull in potential semantic matches, and the graph layer ties those candidates into entities, edges, and timelines. The result is a synergy of speed and structureâfast recall from vectors, grounded reasoning from graphs.
Hereâs how it works:
- Cast the net (vectors). Embed the query and retrieve nearest neighbors to get semantically relevant candidatesâeven when phrasing differs.
- Map to structure (graph). Link candidates to entities and relationships, apply constraints (e.g., type or time), and surface the nodes/edges that matter.
- Traverse and assemble. Follow multi-hop paths to connect facts across snippets, documents, and contexts.
- Explain and ground. Return the answer with the reasoning path and provenance intact.
Tools We Ship With: Pros & Trade-offs
Our team has shipped production systems using a mix of vector stores and graph databases. Each tool comes with its own use cases, strengths, and trade-offs, and weâve learned where they shine through hands-on experience. Below are some of the platforms we run in the cognee backend, along with the insight on what shapes our choices.
pgvector (Postgres)
- Whatâs great: Co-locate vectors with relational metadata; high concurrency; familiar SQL and tooling keep developer friction low; strong performance and reliability in practice
- Trade-offs: Setup is heavier than a file-based library, but once automated itâs negligible
LanceDB
- Whatâs great: Per-user isolation with almost no ops; embedded and file-backed, so each user/workspace maps cleanly to its own file
- Trade-offs: Reads can be less straightforward than with server-based options; asynchronous writes generally require explicit locking; file-based design limits concurrency and simultaneous users
Qdrant
- Whatâs great: Managed vector service with strong filtering; fast core engine; straightforward to operate; free cloud tier useful for tests and demos
- Trade-offs: Setup is heavier than a file-based library like LanceDB (clusters, accounts, networking), though the cloud offering hides most of that
Neo4j
- Whatâs great: Mature tooling; rich features/support; intuitive Cypher/UX; flexible property-graph modelâadd properties to nodes and relationships dynamically; reliable and customizable
- Trade-offs: Deployment is heavier than file-based options like KĂčzu; scaling to multi-user setups and true multi-database isolation typically requires the Enterprise edition; costs can rise in production
KĂčzu
- Whatâs great: File-based graph DB thatâs easy to set up; well-suited for per-user isolation or split graphs; fast and reliable within file-based constraints
- Trade-offs: Dynamic properties are more cumbersome since you must define node and relationship tables; reading/inspecting data is less convenient and UI tooling is limited; lower concurrency than server-based alternatives (per-user DB files can mitigate this)
Memgraph
- Whatâs great: Slick, well-designed UI that makes exploration and development feel smooth
- Trade-offs: Handling of node and edge IDs differs from many other graph systems; this can introduce friction for migrations and interoperability with tools that assume a specific ID model, and may require adapter code when integrating with existing pipelines
A Tiny Decision Cheat-Sheet
To save you the trouble of parsing long comparisons, hereâs a short set of rules-of-thumb weâve found helpful when deciding which engine to reach for in practice:
- Per-user isolation, minimal ops: LanceDB (vectors) + KĂčzu (graphs). Both file-based; easy to spin up N isolated stores
- Shared infra, heavy concurrency, SQL tooling: Postgres + pgvector for vectors; optionally pair with Neo4j for sophisticated graph workloads
- Managed, straight-to-cloud vector search: Qdrant Cloud for vectors; graph either in Neo4j or KĂčzu if embedded is fine
Compose the Stack You Need
The honest take from our team is that every option above comes with sharp edgesâand thatâs perfectly normal. What matters is flexibility: being able to shape the stack around your data, your workload, and your team. With vectors and graphs working together, you have the building blocks to create memory that not only performs but continues to evolve.
The list above covers just some of the frameworks weâve used cognee withâyouâll find many more in our community repository. If your preferred database isnât there yet, you can easily write an adapter using our docs for vector or graph integration.
In short, thereâs no outright âwinnerâ between vectors and graphs. The best engine is the one you make fit your workloadâand the real advantage comes from adapting, extending, and keeping the loop open.