Skip to main content

AI-news aggregation: the `brf` tool CLI (agent-side) plus the `orchestrator` package (host-side cron + Slack receiver).

Project description

Blog Research Feed

每天 9am 自动跑的 AI 新闻策展 agent:在 Anthropic Managed Agents 里 host 一个 Claude,cron 每天拉昨天的内容(RSS / X / Firecrawl / YouTube / podcast), agent 自己决定哪些深挖、写报告,最后发 Slack。

架构(一句话版)

brf CLI 在容器内预装(via environment pip packages),secret 通过 Files API 每次 session 上传一个临时 .env 挂到 /workspace/.env,agent 用 bash + brf | jq | brf pipeline 自己干活。详细 ASCII 图见 ARCHITECTURE.md

 GitHub Action (cron 09:00 UTC)
   ↓ python -m orchestrator.daily
 Orchestrator:
   1. build .env payload from host env vars
   2. files.upload(.env)
   3. sessions.create(resources=[file mount at /workspace/.env])
   4. stream events for logs, exit on idle/terminated
   5. files.delete(uploaded .env)
 Inside container (Anthropic cloud):
   $ brf fetch rss --since YESTERDAY > /tmp/rss.json
   $ brf firecrawl scrape --url <interesting> > /tmp/scrape.json
   $ brf report slack --message-file /tmp/report.md

v0 → v1 重构:从 7 个 custom tool + orchestrator dispatch 改成 CLI + bash pipe + Files API secret mount。优势:agent 能 brf | jq | brf 组合, 不用每个 tool call 一次往返 host。代价:secret 在容器里(依赖容器隔离 + prompt injection 防御,权衡见 ARCHITECTURE §3)。

仓库导览

路径 用途
brf/ CLI bundle (pip install -e .brf 命令)
brf/main.py Click 入口,子命令 fetch / firecrawl / report / daily
brf/ (容器内) 纯 tool CLI:fetch/firecrawl/report。pip 安装到 agent 容器里
orchestrator/daily.py (host) Cron 端编排器:build .env、Files API upload、开 session、流式日志
brf/rss.py feedparser + sources.opml,跳过 SOURCES_HEALTH 标记的死链
brf/x_client.py X API v2,优雅处理 402 no_credits
brf/firecrawl_client.py Firecrawl SDK v2 scrape/search
brf/youtube.py youtube-transcript-api + oEmbed 抓 transcript
brf/podcast.py RSS → mp3 → OpenAI Whisper (25MB cap)
brf/slack.py Markdown → Slack mrkdwn + Block Kit
agent/ Managed Agent 配置(system_prompt.md + agent.yaml + environment.yaml
scripts/create_agent.py 一次性创建/更新 agent,打印 ID 用于 GitHub Secrets
.github/workflows/daily.yml GitHub Action cron
docs/managed_agents/ Anthropic Managed Agents 完整文档(17 篇,本地)
docs/agent_sdk/ Claude Agent SDK 文档(16 篇,本地)
sources.opml / sources.md 订阅源清单(60 RSS + 30 无 feed 站 + 45 X 账号)
SOURCES_HEALTH.md 源健康审计(哪些死链、哪些需 Firecrawl 补全文)

一次性 setup

1. 准备 API keys

需要:

  • ANTHROPIC_API_KEY — Anthropic API
  • FIRECRAWL_API_KEY — Firecrawl (https://firecrawl.dev)
  • X_BEARER_TOKEN — X API v2 (https://developer.x.com) ⚠️ 当前 0 credits
  • OPENAI_API_KEY — Whisper transcript (~$0.006/min)
  • SLACK_WEBHOOK_URL — Slack incoming webhook,要发的频道里建一个

复制 .env.example.env 填进去。

2. 创建 Managed Agent + Environment

pip install -e .
python scripts/create_agent.py

它会:

  1. 调用 client.beta.environments.create() 创建 cloud 环境(带 pip install brf from git + apt jq/ffmpeg)— 这一步在 Anthropic 那边 build container image,会比较慢(30s-2min)
  2. 调用 client.beta.agents.create() 注册 agent(system prompt + 内置 toolset,无 custom tool
  3. 打印 agent_id + env_id 到 stdout —— 这俩 ID 你不用动,orchestrator 每次启动时按 name(来自 agent/agent.yaml / environment.yaml) 到 Anthropic API 查实时 ID

需要改 agent 配置时重跑 python scripts/create_agent.py --update(创建新版本)。 改 environment.yaml(加包等)会自动建新 env —— orchestrator 下次跑就用新的, 不需要更新任何 secret。

3. 配 GitHub Secrets

仓库 Settings → Secrets → Actions,加 5 个

  • ANTHROPIC_API_KEY — orchestrator 用来调 sessions/files API
  • FIRECRAWL_API_KEY · X_BEARER_TOKEN · OPENAI_API_KEY · SLACK_WEBHOOK_URL — 这 4 个 orchestrator 在每次 run 时打包成 .env 上传给容器;agent 在容器里 brf 自动读 /workspace/.env

(agent_id / env_id 不用配 — 按 yaml 里的 name: 查表得来)

4. 试跑

手动触发:仓库 Actions → "daily ai news" → Run workflow。

或本地 dry-run:

python -m orchestrator.daily --dry-run

CLI 烟测

每个 fetch 子命令都能独立跑(便于 debug):

brf fetch rss --since 2026-05-18                  # 拉昨天所有 RSS items 到 stdout JSON
brf fetch x-user --handle karpathy --since 2026-05-18
brf firecrawl scrape --url https://cognition.ai/blog
brf firecrawl search --query "frontier model release 2026" --limit 10
brf fetch youtube-transcript --url https://www.youtube.com/watch?v=xxx
brf fetch podcast-transcript --url https://feeds.transistor.fm/the-cognitive-revolution
echo "# Test report" | brf report slack --message-file /dev/stdin

每个子命令都打印 {...} JSON 到 stdout,错误到 stderr,方便管道。

已知限制

详见 ARCHITECTURE.md §7:

  • X API 当前 0 credits — fetch_x_user 会返回 {status: "no_credits"},agent 按 system prompt 跳过该工具
  • Whisper 单文件 25MB cap,长 podcast 会被拒(status too_large
  • 12+ feed URL 已死(见 SOURCES_HEALTH.md)已硬编码在 brf/rss.py SKIP_FEEDS
  • SSE stream 断线不会自动重连(30min 内单次任务通常不会触发)

开发

# 装依赖(feedparser 拽 sgmllib3k 老库,需要老 setuptools 才能 build)
pip install --upgrade "setuptools<72" wheel
pip install -e .

# 跑测试
python -c "import ast; [ast.parse(open(f).read()) for f in ['brf/main.py']]"

历史

  • 2026-05-19 v0:scaffold 完整管线,所有模块已实现 + reviewed。详见 HANDOFF.md(v0 之前的设计决策)和 SOURCES_HEALTH.md(源审计)。
  • 2026-05-19 v1:架构重构 — 删除 7 个 custom tool 和 orchestrator dispatch 逻辑(~180 行);改用 Files API mount .env 到容器,agent 在 bash 里 直接 pipe brf CLI。详见 commit message 和 ARCHITECTURE.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blog_research_feed-0.1.1.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blog_research_feed-0.1.1-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file blog_research_feed-0.1.1.tar.gz.

File metadata

  • Download URL: blog_research_feed-0.1.1.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for blog_research_feed-0.1.1.tar.gz
Algorithm Hash digest
SHA256 77e9d71ee517b443224c58c5869245af9a1cff96b6df9731aff75251d7b1510a
MD5 3e8c9c8d01215d3cb02009dc5f28097a
BLAKE2b-256 b074365ef9feb386144af658a051578fadff1fb9018bd0d30afc2b449b000c9a

See more details on using hashes here.

File details

Details for the file blog_research_feed-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for blog_research_feed-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 54740fd717c8e4f1cd2ffe88199fb4f9a25251a89f2d9ae09d7b527a3e5c94e3
MD5 37a2545038120873b9ab85dc9f4144ca
BLAKE2b-256 23334e7dffeee2b9fe4af5d586ac8262d95a073a5c53fdbf2c316e879ae97117

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page