Cut Through the Noise: Build Your Smart News Agent with cognee
These days, it can feel like every time you open a news app, you’re blasted with a barrage of headlines—endless scrolling through recycled stories, five-day-old “breaking” news, targeted posts, and other fluff that buries what you actually care about. In a world inundated with misinformation and stale takes, how do you focus on what’s truly important and current?
Well, we think we’ve found a way.
We’ve put together a simple script that turns cognee into a personal news-curation agent. It pulls from predefined sources—Reddit threads, company blogs, media outlets, and research feeds—creates a structured knowledge graph from the data, and lets you query it for concise, relevant summaries of the latest events.
Spend less time wading through clutter—this script delivers on-point updates drawn from fresh, integrated, and pertinent sources. In this post, we’ll cover how it combines open tools like praw (for Reddit) and feedparser (for RSS) with cognee’s ingestion to customize a digest for any topic.
Want to stay informed without misinformation fatigue? Here's a step by step guide to configuring the script.
Prep Time: Loading Libraries and Clearing the Deck
If you're new to cognee, our engine was designed to ingest raw data from various sources and transform it into queryable knowledge graphs, maintaining it over time for accurate retrieval.
So, to set up the scraper agent, we'll first need to import the necessary libraries, load the environment variables (for secure API keys), and ensure we're starting with a clean slate in cognee. This prevents any old data from interfering and keeps your knowledge graph focused on the latest scrapes.
Here's a sample setup (adjust based on your env):
We use the publicly available praw package to scrape Reddit posts—it's reliable for pulling community discussions, which often surface timely debates and news not covered elsewhere.
You'll need your Reddit client ID and secret for the API, and here’s how to find them:
1- Head to Reddit App Preferences: 👉 https://www.reddit.com/prefs/apps
2- Scroll down and click “Create another app…”
3- Fill out the form:
- Name: Anything (e.g., My Reddit Agent)
- Type: Choose “script” (for personal use)
- Description: Optional
- Redirect URI: Must be valid—use http://localhost:8080 (or any dummy URL)
- About URL: Optional
4- Submit.
5- You’ll now see your new app listed.
- The Client ID is the short string just under the app name.
- The Client Secret is the long string labeled “secret.”
- Store these credentials in your .env file for secure access.
Source Curation: Pulling Relevant Feeds
Next, define the subreddits that matter most to you. We’ll take the topic of AI as an example here, and in the below snippets we’ve selected some relevant communities that cover everything from machine learning breakthroughs to generative AI tools.
This is, of course, fully customizable—swap in subs for any topic, like finance or sports, to build a specialized agent for your needs.
With praw set up, we iterate through your subreddits, fetch recent posts (e.g., top/hot/new), save them temporarily, and add them to cognee's dataset. This step turns raw discussions into structured data ready for knowledge building.
Here's a sample code block for ingestion (expand as needed):
If you’re happy with Reddit as the source of info you need, you can jump ahead to building the memory. If you’d like a more expansive corpus of news, we’ve got that covered up next.
Broadening the Scope with Industry Insights
Some of the most up-to-date news are most often found on industry-leading company blog posts and in research papers.
We use RSS feeds for efficient pulling and parse them via feedparser.
Sticking with our example of AI-relevant companies and research, we need to define the websites to scrape. To ensure you actually pull everything, you can even define sitemaps!
Here’s a selection of pages we’ve chosen for our use case:
Then, iterate and pull:
Going beyond Reddit to coverage from diverse, authoritative sources should help filter out misinformation.
Building the Knowledge Graph of News
Now, what cognee was made for—to transform ingested data into a structured knowledge graph for smart querying and meaningful insights.
First, call .cognify(). This chunks the data, extracts entities and their relationships, and generates summaries—turning your news sources into an interconnected, queryable memory layer.
You can run .visualize_graph() to see the semantic connections (e.g., linking a Reddit post on GPT-5 to an OpenAI blog update).
What you get with this is a current event repository that keeps up with reality, providing you with context-aware updates without outdated noise.
Getting Your AI News Summary
With the heavy lifting done, query your agent using your preferred LLM. cognee's .search() pulls from the graph, considering all recent and past events for a balanced, relevant response.
The final result is a quick, bullet-point summary like this:
This isn't just a list—it's synthesized knowledge that highlights what's important, drawing from your scraped sources to deliver timely, misinformation-resistant insights.

Cut Through the Noise: Build Your Smart News Agent with cognee

Competition Comparison - Form vs. Function
