Skip to main content

AI-news aggregation: the `brf` tool CLI (agent-side) plus the `orchestrator` package (host-side cron + Slack receiver).

Project description

Blog Research Feed

每天 9am 自动跑的 AI 新闻策展 agent:在 Anthropic Managed Agents 里 host 一个 Claude,cron 每天拉昨天的内容(RSS / X / Firecrawl / YouTube / podcast), agent 自己决定哪些深挖、写报告,最后发 Slack。

架构(一句话版)

brf CLI 在容器内预装(via environment pip packages),secret 通过 Files API 每次 session 上传一个临时 .env 挂到 /workspace/.env,agent 用 bash + brf | jq | brf pipeline 自己干活。详细 ASCII 图见 ARCHITECTURE.md

 GitHub Action (cron 09:00 UTC)
   ↓ python -m orchestrator.daily
 Orchestrator:
   1. build .env payload from host env vars
   2. files.upload(.env)
   3. sessions.create(resources=[file mount at /workspace/.env])
   4. stream events for logs, exit on idle/terminated
   5. files.delete(uploaded .env)
 Inside container (Anthropic cloud):
   $ brf fetch rss --since YESTERDAY > /tmp/rss.json
   $ brf firecrawl scrape --url <interesting> > /tmp/scrape.json
   $ brf report slack --message-file /tmp/report.md

v0 → v1 重构:从 7 个 custom tool + orchestrator dispatch 改成 CLI + bash pipe + Files API secret mount。优势:agent 能 brf | jq | brf 组合, 不用每个 tool call 一次往返 host。代价:secret 在容器里(依赖容器隔离 + prompt injection 防御,权衡见 ARCHITECTURE §3)。

仓库导览

路径 用途
brf/ CLI bundle (pip install -e .brf 命令)
brf/main.py Click 入口,子命令 fetch / firecrawl / report / daily
brf/ (容器内) 纯 tool CLI:fetch/firecrawl/report。pip 安装到 agent 容器里
orchestrator/daily.py (host) Cron 端编排器:build .env、Files API upload、开 session、流式日志
brf/rss.py feedparser + sources.opml,跳过 SOURCES_HEALTH 标记的死链
brf/x_client.py X API v2,优雅处理 402 no_credits
brf/firecrawl_client.py Firecrawl SDK v2 scrape/search
brf/youtube.py youtube-transcript-api + oEmbed 抓 transcript
brf/podcast.py RSS → mp3 → OpenAI Whisper (25MB cap)
brf/slack.py Markdown → Slack mrkdwn + Block Kit
agent/ Managed Agent 配置(system_prompt.md + agent.yaml + environment.yaml
scripts/create_agent.py 一次性创建/更新 agent,打印 ID 用于 GitHub Secrets
.github/workflows/daily.yml GitHub Action cron
docs/managed_agents/ Anthropic Managed Agents 完整文档(17 篇,本地)
docs/agent_sdk/ Claude Agent SDK 文档(16 篇,本地)
sources.opml / sources.md 订阅源清单(60 RSS + 30 无 feed 站 + 45 X 账号)
SOURCES_HEALTH.md 源健康审计(哪些死链、哪些需 Firecrawl 补全文)

一次性 setup

1. 准备 API keys

需要:

  • ANTHROPIC_API_KEY — Anthropic API
  • FIRECRAWL_API_KEY — Firecrawl (https://firecrawl.dev)
  • X_BEARER_TOKEN — X API v2 (https://developer.x.com) ⚠️ 当前 0 credits
  • OPENAI_API_KEY — Whisper transcript (~$0.006/min)
  • SLACK_WEBHOOK_URL — Slack incoming webhook,要发的频道里建一个

复制 .env.example.env 填进去。

2. 创建 Managed Agent + Environment

pip install -e .
python scripts/create_agent.py

它会:

  1. 调用 client.beta.environments.create() 创建 cloud 环境(带 pip install brf from git + apt jq/ffmpeg)— 这一步在 Anthropic 那边 build container image,会比较慢(30s-2min)
  2. 调用 client.beta.agents.create() 注册 agent(system prompt + 内置 toolset,无 custom tool
  3. 打印 agent_id + env_id 到 stdout —— 这俩 ID 你不用动,orchestrator 每次启动时按 name(来自 agent/agent.yaml / environment.yaml) 到 Anthropic API 查实时 ID

需要改 agent 配置时重跑 python scripts/create_agent.py --update(创建新版本)。 改 environment.yaml(加包等)会自动建新 env —— orchestrator 下次跑就用新的, 不需要更新任何 secret。

3. 配 GitHub Secrets

仓库 Settings → Secrets → Actions,加 5 个

  • ANTHROPIC_API_KEY — orchestrator 用来调 sessions/files API
  • FIRECRAWL_API_KEY · X_BEARER_TOKEN · OPENAI_API_KEY · SLACK_WEBHOOK_URL — 这 4 个 orchestrator 在每次 run 时打包成 .env 上传给容器;agent 在容器里 brf 自动读 /workspace/.env

(agent_id / env_id 不用配 — 按 yaml 里的 name: 查表得来)

4. 试跑

手动触发:仓库 Actions → "daily ai news" → Run workflow。

或本地 dry-run:

python -m orchestrator.daily --dry-run

CLI 烟测

每个 fetch 子命令都能独立跑(便于 debug):

brf fetch rss --since 2026-05-18                  # 拉昨天所有 RSS items 到 stdout JSON
brf fetch x-user --handle karpathy --since 2026-05-18
brf firecrawl scrape --url https://cognition.ai/blog
brf firecrawl search --query "frontier model release 2026" --limit 10
brf fetch youtube-transcript --url https://www.youtube.com/watch?v=xxx
brf fetch podcast-transcript --url https://feeds.transistor.fm/the-cognitive-revolution
echo "# Test report" | brf report slack --message-file /dev/stdin

每个子命令都打印 {...} JSON 到 stdout,错误到 stderr,方便管道。

已知限制

详见 ARCHITECTURE.md §7:

  • X API 当前 0 credits — fetch_x_user 会返回 {status: "no_credits"},agent 按 system prompt 跳过该工具
  • Whisper 单文件 25MB cap,长 podcast 会被拒(status too_large
  • 12+ feed URL 已死(见 SOURCES_HEALTH.md)已硬编码在 brf/rss.py SKIP_FEEDS
  • SSE stream 断线不会自动重连(30min 内单次任务通常不会触发)

开发

# 装依赖(feedparser 拽 sgmllib3k 老库,需要老 setuptools 才能 build)
pip install --upgrade "setuptools<72" wheel
pip install -e .

# 跑测试
python -c "import ast; [ast.parse(open(f).read()) for f in ['brf/main.py']]"

历史

  • 2026-05-19 v0:scaffold 完整管线,所有模块已实现 + reviewed。详见 HANDOFF.md(v0 之前的设计决策)和 SOURCES_HEALTH.md(源审计)。
  • 2026-05-19 v1:架构重构 — 删除 7 个 custom tool 和 orchestrator dispatch 逻辑(~180 行);改用 Files API mount .env 到容器,agent 在 bash 里 直接 pipe brf CLI。详见 commit message 和 ARCHITECTURE.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blog_research_feed-0.1.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blog_research_feed-0.1.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file blog_research_feed-0.1.0.tar.gz.

File metadata

  • Download URL: blog_research_feed-0.1.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for blog_research_feed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 852035057b9f4f8b0dccb4383dd61a1645134a952d83e7b90ed5d279291d0cf2
MD5 c2816f85d88f385c9d9e9a9e52dd1a7e
BLAKE2b-256 e87b9a7022567aa1a902995de9bb4290a4daab28b7a78aa9a717c93f212d9067

See more details on using hashes here.

File details

Details for the file blog_research_feed-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for blog_research_feed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d889e6cab51158d1f308688721ed04a7d78d4d96c59430b418d22d352e146f30
MD5 80e15ddd574905abecb5af028a684eed
BLAKE2b-256 cce9b997a3296b30282e8ba15d88cb09adf57b1f8e4a2f0c64d76a802987c26c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page