Turning PDFs into Evidence-Based Answers: How We Built a Trustworthy Evidence Graph for UWYO
Teachers and policymakers donât lack researchâthey lack answers they can trust. Policy briefs, randomized trials, demonstration projects, meta-analyses⊠While informative, they are mostly locked in a mess of PDFs, they all use different terminology, and are rarely aligned on definitions, measures, or timelines.
For an overworked state teacher trying to make evidence-informed decisions that truly support their students, even a seemingly simple question like:
âWhich interventions demonstrably improve Kâ5 behavior outcomes, and over what timeframe?â
has no quick answer. The evidence is out thereâitâs just scattered and inconsistent.
Thatâs the challenge the special education team from the University of Wyoming (UWYO) brought our way: turn a fragmented set of research and validated practical guides into a living, explainable knowledge system that special education teachers can query in plain Englishâand verify in a few clicks.
This is how we used cognee to convert unstructured documents into evidence-based answers with citations teachers can trust.
With UWYOâs permission weâre sharing this case at a high level, but certain specifics of the solution and the Agent implementation are intentionally generalized.
UWYOâs Evidence Problem: Rich Research, Disconnected Reality
UWYO had the right ingredientsâinstructional program evaluations, RCTs (randomized control trials), demonstration projects, EBP (evidence-based practices) guides which provide a wide range of effective instructional practices that could support and improve educational outcomes for students with unique needsâbut their workflow broke down at scale due to:
- Competing dialects: the same construct labeled five different ways across sources.
- Entity mismatch: populations, contexts, and outcomes overlapped but didnât align cleanly.
- Traceability gaps: hard to show exactly where a claim came from (at least quickly).
- Manual QA only: review cycles didnât scale as the corpus grew.
Each document on its own was valuable. Together, they were hard to navigate and even harder to defend in policy conversations.
From PDFs to Explainable Answersâwith cognee
Our task wasnât to figure out how to âstore their PDFsâ in a more organized way. It was to make the evidence navigableâto structure, connect, and explain it, enabling natural language questions with answers that are grounded and easy to verify in an instant.
So, instead of forcing everything into rigid tables, we used a hybrid approach:
- Semantic understanding to normalize language and catch nuance.
- A knowledge graph to make relationships explicit, auditable, and queryable.
The result: a system for UWYOâs agent framework that âspeaks education,â homogenizes terminology without losing specificity, and shows its workâdown to page and figure.
The Mission: Connected Understanding
We set out to model how interventions, populations, contexts, outcomes, and time relateâthen answer natural-language questions with grounded responses and click-through citations.
To get there, we had to: normalize language, build a reference and citation system, and scale ingestion to cover all relevant literature for this project. Think of it as standing up a specialized teammate for special education decisions, not a general-purpose chatbot.
Navigating the Evidence Chaos
Real-world research is inherently messy. We designed the pipeline to handle a range of data formats including:
- Documents and sheets with narratives, tables, appendices, figures, etc.
- Evaluation reports with varied metrics and outcome scales.
Provenance was of utmost priority: that every fact links back to its source page/section.
The Solution: A Brain for Domain-Aware Evidence Agent
We built a four-stage flow stakeholders can understandâand trust.
1) Collect & Clean
- Parse PDFs reliably.
- Segment content at the page/section/figure level.
- Clean and format while preserving original structure and references.
2) Harmonize the Language
- Align core concepts: interventions, populations, settings, outcomes, measures, timepoints.
- Map synonyms and inconsistent labels to shared meaningsâwithout flattening nuance.
3) Connect the Dots
- Build a living knowledge map linking interventions â outcomes â contexts â populations â measures â time.
- Capture directionality (improves / no effect / mixed) and strength of evidence.
- Keep citations attached to every relationship for instant traceability.
4) Ask in Natural Language
-
Enable naturally phrased questions, accessible to everyone.
- Example: âWhat improves Kâ5 behavior outcomes by 6 and 24 months?â
-
Return concise answers + supporting citations + exact page links.
- Example: Evidence indicates Tiered PBIS (Positive Behavioral Interventions and Supports) and teacher-mediated SEL (social emotional learning) programs produce short-term gains (â6 months) on behavior incident rates and classroom engagement; effects persist at 24 months when programs include fidelity monitoring and staff coaching.
See the sources: [Study A (pp. 12â15)], [Meta-analysis B (Table 3, pp. 45â46)], [Demonstration Project C (pp. 7â9)].
-
Provide expandable âwhyâ context: which studies, which populations, which measures.
- Example: Effects are strongest in elementary (Kâ5) settings with whole-class delivery, teacher PD â„12 hrs, and monthly fidelity checks.
What Changed for UWYO
- Faster answers: Teachers and analysts ask plain-English questions and get structured, defensible responses in seconds.
- Evidence-based transparency: Every claim is backed by click-through citations and page snippets.
- Agent-ready memory: UWYOâs agent framework can pull grounded facts instead of guessing from generic embeddings.
- Shared vocabulary: Harmonized terms reduce confusion across teams and documents.
What Weâve Learned
The Challenging Bits
- Outcome normalization: the same idea can hide behind many labels; harmonization is non-optional.
- Time matters: short- vs long-term effects need explicit modeling across 6/12/24-month windows.
- Explainability is non-negotiable: in education contexts, âthe model said soâ is not an answer.
What Worked
- Domain-first modeling: mirror how education decisions actually get made.
- Provenance everywhere: citations turn answers into evidence.
- Hybrid retrieval: semantics for recall; structure for precision and trust.
- Human-in-the-loop: aim expert review at the highest-leverage normalization and QA points.
Build With Us
If your organization wrestles with scattered research and policy documents, we can help you stand up a purpose-built evidence agent for your domainâone that makes your evidence explorable and explainable.
- Email us: info@topoteretes.com
- Explore the docs: https://docs.cognee.ai
The future of evidence-informed decision-making isnât âbetter search.â Itâs structured understandingâdelivered by focused agents with citations you can trust.