In This Article
Introduction
OpenClaw is widely used to manage "Content Pipelines." A primary example is the Multi-Source Tech News Digest, which aggregates data from over 100 sources — including RSS feeds, Twitter/X, and GitHub releases. The system deduplicates articles based on title similarity and applies a "quality score" (Priority Source +3, Recency +2) before delivering a summary to the user's Discord or WhatsApp.
Tech professionals face an information overload problem. Dozens of newsletters, hundreds of RSS feeds, constant Twitter updates. The digest solves this: one morning briefing, curated and deduplicated, delivered to your preferred channel. You get the signal without the noise. This article walks through how to build one.
Architecture
The digest runs as a Heartbeat task, typically at 7:00 AM. Flow:
- Fetch all configured sources (RSS, API, scrapers)
- Parse and normalize to common format
- Deduplicate by title similarity
- Apply quality score
- Select top N items
- Summarize via LLM (or use titles + links)
- Deliver to configured channel
Two-tier processing applies: Tier 1 fetches and deduplicates (deterministic, cheap). Tier 2 (LLM) runs only when there are items to summarize. If all sources are quiet, zero LLM calls. See two-tier processing for cost savings.
Source Aggregation
Common sources:
- RSS: TechCrunch, Hacker News, Ars Technica, GitHub blog, The Verge, Wired, etc. Most tech publications offer RSS. Use a library like feedparser (Python) or rss-parser (Node).
- Twitter/X: Follow key accounts; filter by keywords. Use the API (with rate limits) or unofficial scrapers. Store tweets as "articles" with title = first 100 chars, link = tweet URL.
- GitHub: Releases from watched repos; trending projects. GitHub API provides release notes. Great for "what's new in framework X."
- News APIs: Google News, NewsAPI for broader coverage. Paid APIs offer more volume; free tiers may suffice for personal use.
Store source list in a config file or memory. Add/remove sources without code changes. A typical power user runs 50–150 sources. Start with 20; expand as you find value.
Deduplication
Same story appears across multiple sources. "OpenAI Announces New Model" on TechCrunch, The Verge, and Hacker News — one story, three entries. Deduplication strategies:
- Title similarity: Embed titles; cluster by cosine similarity; keep one per cluster. Use a lightweight embedding model (e.g., sentence-transformers) or simple TF-IDF. Threshold ~0.85 similarity = same story.
- URL canonicalization: Strip tracking params; detect same article across domains (many sites syndicate). If two URLs point to the same canonical article, keep one.
- Key phrase overlap: "OpenAI announces X" vs "OpenAI launches X" → same story. Extract key phrases (entity + action); match across items.
Two-tier processing: deduplication is deterministic (Tier 1); summarization uses LLM (Tier 2). Deduplication typically cuts 200+ raw items down to 30–50 unique stories.
Quality Scoring
Not all sources are equal. Priority Source +3: Hacker News, official blogs (OpenAI, Anthropic, Google AI), tier-1 publications get a bonus. Recency +2: Last 6 hours score higher than yesterday. Engagement +1: Optional — if source provides engagement metrics (HN points, retweets), factor in.
Formula example: score = base(1) + priority_bonus(0–3) + recency(0–2) + engagement(0–1). Sort by total score. Take top 10–15. Pass to summarization.
Tuning: Adjust weights based on your preferences. If you care more about recency, boost that. If you trust HN over random blogs, boost priority. The scoring is fully customizable.
Delivery
Output format: Bullet list with title, source, link, 1-sentence summary. Deliver via Discord webhook, WhatsApp, or Telegram. User gets morning briefing without opening any app.
Example output:
📰 Tech Digest — Feb 18, 2026
• OpenAI announces GPT-6 API (TechCrunch) — General availability with 50% cost cut. https://...
• Anthropic Claude 4 model card (Official) — New reasoning capabilities, 200K context. https://...
• OpenClaw 2026.2.17 security patches (GitHub) — CVE fixes, Docker hardening. https://...
...
Optional: LLM-generated "theme of the day" — "Today's digest is dominated by AI model releases and security updates." Adds personality; costs a few hundred tokens.
A Day in the Life
6:00 AM: Heartbeat triggers. Tier 1 fetches 120 items from 100 sources. Deduplication reduces to 45 unique stories. Quality scoring ranks them. Top 12 selected.
6:02 AM: Tier 2 (LLM) receives the 12 items. Generates one-sentence summary per item. Formats as bullet list. Adds optional theme.
6:05 AM: Message sent to user's Discord. User wakes at 7, checks phone, sees digest. Skims in 2 minutes. Clicks 2 links for deep dives. Rest ignored. Total time: 5 minutes vs. 45 minutes of manual browsing.
Cost and Optimization
Cost: ~$5–15/month depending on source count and summarization depth. Tier 1 (fetch + dedupe) is negligible. Tier 2: ~12 items × 2 summaries/day × 30 days = ~720 LLM calls. At ~500 tokens/call, ~360K tokens/month. At $3/M, ~$1.10. Add theme generation, multiple deliveries, or more items, and you're in the $5–15 range.
Optimization: Reduce summarization to titles + links only (no LLM) for ultra-low cost. Or summarize only top 5. Or run every other day. The architecture is flexible.
See OpenClaw pricing for full cost guidance.
Wrapping Up
The Multi-Source Tech News Digest is a canonical OpenClaw use case — proactive, multi-source, intelligent filtering. One Heartbeat task, 100+ sources, one morning briefing. See Heartbeat Engine and Content Factory for related patterns.