GraphRAG
One idea makes this click:
It's not about the graph.
It's about what you pre-compute from it.
First, the problem
Standard RAG finds chunks similar to your question:
Query
"What did Sarah say about the budget?"
Answer
Sarah said the budget needs revision and was too tight for Q3.
✓ Works great for specific questions.
But what about this?
"What are the main themes in the dataset?"
Vector search finds chunks similar to the query.
But no single chunk contains "the main themes."
This isn't a retrieval problem. It's a summarization problem.
The obvious idea
"What if we built a knowledge graph?"
Extract entities → Extract relationships → Build graph → Traverse at query time
Many systems do this: KAPING, G-Retriever, LlamaIndex, LangChain...
But does it solve our problem?
Try it yourself
Here's a knowledge graph. Click around. Can you answer:
"What are the main themes?"
You can see connections. But "main themes"? You'd have to visit every node.
The problem with graph traversal
Knowledge graphs help with local questions.
They don't help with global questions.
The GraphRAG insight
From the paper:
"In contrast with related work that exploits the structured retrieval and traversal affordances of graph indexes, we focus on a previously unexplored quality of graphs in this context: their inherent modularity and the ability of community detection algorithms to partition graphs into modular communities of closely-related nodes."
The innovation isn't the graph itself.
It's using communities to organize summaries.
The key: Community Summaries
Step 3 is the magic. It happens at index time, not query time.
See the difference
Toggle to see what summaries add:
• Dr. Chen leads Nexus Labs
• Dr. Chen collaborates with Miller
• Miller works at FDA
• Dr. Chen conducted Trial 2024
• Nexus Labs is in Seattle
• Dr. Webb works at Stanford
• Dr. Webb is rival of Dr. Chen
...just facts. No synthesis.
Centers on Dr. Chen's Alzheimer's research at Nexus Labs, including FDA collaboration and the 2024 clinical trial.
Dr. Webb's competing research at Stanford, presenting alternative findings.
✓ Now you can instantly see the themes.
This is Query-Focused Summarization
From the paper:
"RAG fails on global questions directed at an entire text corpus, such as 'What are the main themes in the dataset?', since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task."
— Abstract
The graph is just scaffolding. The summaries are the product.
The hierarchy matters
Communities form a hierarchy. Pick a level:
Fewest communities, broadest scope
Sub-communities of C0
Sub-communities of C1 — most granular
2 summaries — 97% fewer tokens than full text
Global Search: Map-Reduce
For "what are the main themes?", use all community summaries:
Query:
"What are the main themes in the dataset?"
MAP: Ask each community summary
REDUCE: Synthesize top points
Local Search: Entity-centric
For specific questions, start with entities, then traverse:
Answer: Dr. Chen led Alzheimer's research at Nexus Labs, conducting the 2024 clinical trial in collaboration with Miller from the FDA.
The essence of GraphRAG
✗ What GraphRAG is NOT:
Just using a knowledge graph for retrieval (many systems do this)
✓ What GraphRAG IS:
Using graph modularity to partition data, then pre-computing summaries at each level to enable Query-Focused Summarization at scale.
Why this works
Summaries are pre-computed
Pay the cost once at index time, not every query
Hierarchy gives you flexibility
Broad themes (C0) or specific details (C2, C3)
Map-reduce scales
Query communities in parallel, aggregate results
The tradeoffs
Comprehensiveness
72% wins
Diversity
62% wins
Query tokens (C0)
97% fewer
Directness
Naive RAG wins
So when should you use it? Tap to check:
The full picture
Remember this
GraphRAG = pre-computed hierarchical summaries
using graphs as scaffolding for
Query-Focused Summarization
The graph enables the summaries.
The summaries enable global understanding.