Sep 15, 2025

6 minutes read

Sep 15, 2025

6 minutes read

Vectors + Graphs in Practice: Field Notes from the cognee Backend

Hande KafkasGrowth Engineer

Here at cognee, we run a vector store and a graph database side-by-side. That’s not redundancy—it’s by design.

Vectors deliver high-recall semantic candidates (what feels similar), while graphs provide the structure to trace relationships across entities and time (how things relate). Together, they let us cast a wide net quickly and then explain how an answer was assembled, with clear paths and provenance.

Below, we recap the strengths of each, show how they pair in GraphRAG-style pipelines, and share practical notes from our engineers who have shipped with both vector stores and graph databases.

Two Tools, Two Superpowers

Vector databases store high-dimensional embeddings and return nearest neighbors in that space, so “privacy policy update” can match “changes to data handling” even if the wording doesn’t overlap. At scale, you don’t compare against everything—you use an approximate index. HNSW remains a go-to choice because it delivers strong recall with sub-linear lookups.

Graph databases store entities and their relationships, along with their properties. That layout makes multi-hop questions feel natural: “supplier → ships_to → plant → produces → product”. Fraud rings, recommendations, and supply chains are classic graph problems where traversals beat JOIN storms.

The typical cognee flow is hybrid. Vectors cast a semantic net to collect promising snippets; the graph ties those snippets to entities, edges, and constraints, giving us explainable reasoning paths and clean provenance.

How They Pair in GraphRAG

Running vectors and graphs in parallel only pays off when they’re woven into the same retrieval loop. In GraphRAG, the two layers complement each other: vectors pull in potential semantic matches, and the graph layer ties those candidates into entities, edges, and timelines. The result is a synergy of speed and structure—fast recall from vectors, grounded reasoning from graphs.

Here’s how it works:

Cast the net (vectors). Embed the query and retrieve nearest neighbors to get semantically relevant candidates—even when phrasing differs.
Map to structure (graph). Link candidates to entities and relationships, apply constraints (e.g., type or time), and surface the nodes/edges that matter.
Traverse and assemble. Follow multi-hop paths to connect facts across snippets, documents, and contexts.
Explain and ground. Return the answer with the reasoning path and provenance intact.

Tools We Ship With: Pros & Trade-offs

Our team has shipped production systems using a mix of vector stores and graph databases. Each tool comes with its own use cases, strengths, and trade-offs, and we’ve learned where they shine through hands-on experience. Below are some of the platforms we run in the cognee backend, along with the insight on what shapes our choices.

pgvector (Postgres)

What’s great: Co-locate vectors with relational metadata; high concurrency; familiar SQL and tooling keep developer friction low; strong performance and reliability in practice
Trade-offs: Setup is heavier than a file-based library, but once automated it’s negligible

LanceDB

What’s great: Per-user isolation with almost no ops; embedded and file-backed, so each user/workspace maps cleanly to its own file
Trade-offs: Reads can be less straightforward than with server-based options; asynchronous writes generally require explicit locking; file-based design limits concurrency and simultaneous users

Qdrant

What’s great: Managed vector service with strong filtering; fast core engine; straightforward to operate; free cloud tier useful for tests and demos
Trade-offs: Setup is heavier than a file-based library like LanceDB (clusters, accounts, networking), though the cloud offering hides most of that

Neo4j

What’s great: Mature tooling; rich features/support; intuitive Cypher/UX; flexible property-graph model—add properties to nodes and relationships dynamically; reliable and customizable
Trade-offs: Deployment is heavier than file-based options like Kùzu; scaling to multi-user setups and true multi-database isolation typically requires the Enterprise edition; costs can rise in production

Kùzu

What’s great: File-based graph DB that’s easy to set up; well-suited for per-user isolation or split graphs; fast and reliable within file-based constraints
Trade-offs: Dynamic properties are more cumbersome since you must define node and relationship tables; reading/inspecting data is less convenient and UI tooling is limited; lower concurrency than server-based alternatives (per-user DB files can mitigate this)

Memgraph

What’s great: Slick, well-designed UI that makes exploration and development feel smooth
Trade-offs: Handling of node and edge IDs differs from many other graph systems; this can introduce friction for migrations and interoperability with tools that assume a specific ID model, and may require adapter code when integrating with existing pipelines

A Tiny Decision Cheat-Sheet

To save you the trouble of parsing long comparisons, here’s a short set of rules-of-thumb we’ve found helpful when deciding which engine to reach for in practice:

Per-user isolation, minimal ops: LanceDB (vectors) + Kùzu (graphs). Both file-based; easy to spin up N isolated stores
Shared infra, heavy concurrency, SQL tooling: Postgres + pgvector for vectors; optionally pair with Neo4j for sophisticated graph workloads
Managed, straight-to-cloud vector search: Qdrant Cloud for vectors; graph either in Neo4j or Kùzu if embedded is fine

Compose the Stack You Need

The honest take from our team is that every option above comes with sharp edges—and that’s perfectly normal. What matters is flexibility: being able to shape the stack around your data, your workload, and your team. With vectors and graphs working together, you have the building blocks to create memory that not only performs but continues to evolve.

The list above covers just some of the frameworks we’ve used cognee with—you’ll find many more in our community repository. If your preferred database isn’t there yet, you can easily write an adapter using our docs for vector or graph integration.

In short, there’s no outright “winner” between vectors and graphs. The best engine is the one you make fit your workload—and the real advantage comes from adapting, extending, and keeping the loop open.

For new releases, use cases, and all the things we’re working on.

Latest

IntegrationsOct 22, 2025