Chat-first paper distillation: turn arXiv papers into an Obsidian-ready knowledge base.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

paper-distiller

Chat-first paper distillation. Turn arXiv papers into an Obsidian-ready knowledge base — via REPL, one-shot commands, or natural language.

License: MIT

paper-distiller is a command-line tool that searches academic paper sources (arXiv + Semantic Scholar), downloads PDFs, has an LLM distill each one into a structured markdown note, and writes everything to a folder that opens directly in Obsidian.

v1.0 ships a single paper-distiller-chat command with three modes:

Mode	When to use
`paper-distiller-chat` (no args)	Interactive REPL — slash commands + natural-language input
`paper-distiller-chat distill`	One-shot: search a topic, distill N papers
`paper-distiller-chat ask`	One-shot: ask a research question, multi-round QA loop
`paper-distiller-chat resume`	One-shot: continue a paused/errored QA session

Output is plain markdown with YAML frontmatter and [[wikilink]] cross-references — no proprietary format, no lock-in. Graph view, Dataview, tags, and full-text search all work out of the box.

Install

pip install paper-distiller

Requires Python 3.10+. From source:

git clone https://github.com/jesson-hh/paper-distiller
cd paper-distiller
pip install -e ".[dev]"

Configure

paper-distiller needs an OpenAI-compatible LLM endpoint. Cheapest reliable option: Aliyun Bailian's qwen-plus (~¥0.02 per paper).

cp examples/example.env .env
# Edit .env — set PD_API_KEY, PD_BASE_URL, PD_MODEL

Env var	Required	Default	Purpose
`PD_API_KEY`	✓	—	Any OpenAI-compatible API key
`PD_BASE_URL`	✓	—	API endpoint base URL
`PD_MODEL`	✓	—	Model identifier
`PD_PROVIDER_NAME`		`unspecified`	Logging tag only
`PD_PDF_TIMEOUT`		`60`	PDF download timeout (seconds)
`PD_MIN_SURVEY`		`2`	Min articles before composing a session survey
`PD_SS_API_KEY`		(none)	Optional — higher Semantic Scholar rate limit

Provider quick reference

Provider	`PD_BASE_URL`	`PD_MODEL`
Aliyun Bailian (recommended)	`https://dashscope.aliyuncs.com/compatible-mode/v1`	`qwen-plus`
Aliyun Bailian (coding plan)	`https://coding.dashscope.aliyuncs.com/v1`	`qwen3.5-plus`
DeepSeek	`https://api.deepseek.com/v1`	`deepseek-chat`
OpenRouter	`https://openrouter.ai/api/v1`	`qwen/qwen3.5-plus`
Local Ollama	`http://localhost:11434/v1`	`qwen2.5`

Use it

Interactive REPL (recommended)

paper-distiller-chat --vault /path/to/your/vault

You see a welcome banner with provider + vault info, then a prompt. Type slash commands or natural language:

> /help
[command list]

> /vault
Vault: /path/to/your/vault
  articles: 47
  surveys: 6
  ...

> /distill diffusion models --n 3
[live status table during execution]

> 帮我研究下扩散模型在长周期金融时序生成上的最新进展
[intent-router] Intent: ask  | confidence 9
  question: 扩散模型在长周期金融时序生成上的最新进展
Missing: max_rounds, per_round, max_cost_cny
Apply defaults (max_rounds=3, per_round=2, max_cost_cny=5.0) and run? [Y/n]
> Y
[live status table for 3-round QA loop]

> /quit
  (bye)

10 slash commands available: /distill, /ask, /resume, /sessions, /vault, /provider, /agents, /show, /help, /quit.

Natural-language input goes through an LLM intent-router that classifies into one of distill/ask/resume/show and proposes defaults for any missing parameters. You confirm before any expensive operation runs.

One-shot mode (good for scripts / cron)

Distill N papers on a topic:

paper-distiller-chat distill --vault /path/to/your/vault \
    --topic "diffusion models for finance" --n 5

Answer a question across multiple rounds:

paper-distiller-chat ask --vault /path/to/your/vault \
    --question "What are recent advances in long-horizon time-series diffusion?" \
    --max-rounds 3 --per-round 2 --max-cost-cny 5

Resume a paused / errored session:

paper-distiller-chat resume --vault /path/to/your/vault \
    --session-id 20260519-1635-a3f7

Use --dry-run on any subcommand to validate config without spending API budget.

Helpful flags

paper-distiller-chat [--vault PATH]
                     {distill | ask | resume}
                     [subcommand-specific flags]

paper-distiller-chat distill --help etc. show every flag for that subcommand.

What you get — a sample distilled article

---
title: "Convergence Rates of Conditional Flow Matching..."
category: articles
slug: cnf-convergence
tags: [generative-models, theory, distribution-estimation, arxiv-2024]
refs: [arxiv:2410.12345]
depth: full-pdf
---

# CFM 的样本复杂度上界

> **场合**: arxiv preprint, 2024 Oct
> **主题**: 给 CFM 训练给出第一个匹配 nonparametric minimax rate 的有限样本界
> **领域**: 统计 / 生成模型理论

## 一句话
作者证明 CFM 训练在 $\beta$-平滑目标密度下达到 $n^{-\beta/(2\beta+d)}$ 的 $W_2$ 收敛速度…

## 方法
核心是把 vector-field 估计误差 decompose 成 (1) approximation error 由 $\beta$-Hölder ball
覆盖控制 (2) statistical error 用 local Rademacher 处理 (3) discretization error 显式给…

## 与已有 wiki 的关联
对 [[cnf-convergence-distribution-learning]] 的分析路线是个自然的强化…

## 我的 take
最有意思的是 time-singularity 在 CFM 训练里其实从未出现…

Open the vault in Obsidian and this article cross-links automatically with everything else you've distilled.

Vault layout

paper-distiller writes into a vault with these subdirectories (auto-created on first run):

Directory	Auto-written by tool	Description
`articles/`	✓	One file per paper
`surveys/`	✓	Multi-article surveys + `qa-…` final answer docs
`techniques/`, `directions/`, `open-problems/`, `authors/`	—	Reserved for human-curated notes

QA sessions persist resume state at <vault>/.paper_distiller/qa-sessions/<sid>/state.json.

How it works

paper-distiller v1.0 is built around an async DAG of sub-agents:

Single-pass (distill):
  arxiv-searcher  ss-searcher          (parallel)
        └────┬────┘
        candidate-merger
              │
        candidate-ranker (LLM)
              │
        paper-processor × N            (parallel: fetch PDF → extract → distill LLM)
              │
        vault-writer
              │
        survey-composer (LLM, optional)

Multi-round (ask):
  ┌──────────────────────────────────────────────────────┐
  │  progress-reflector (LLM)                             │
  │      ↓                                                │
  │  [stop check: max_rounds / llm_done / llm_brake / ...] │
  │      ↓                                                │
  │  search → dedup → rank → distill × N → write          │
  └────────────────────────────────────────────────────────┘
                          ↓
                  answer-synthesizer (LLM) → surveys/qa-<slug>-<date>.md

11 agents, 4 stop reasons in QA mode, all wired together by a topological-level scheduler. For module structure, full data flow, and internal contracts, see docs/ARCHITECTURE.md.

Cost

Aliyun Bailian qwen-plus pricing — roughly ¥2.1/M input tokens, ¥12.7/M output tokens.

Operation	Typical cost
1 paper distilled	~~¥0.02 (~~$0.003)
5-paper single-pass + survey	~~¥0.7 (~~$0.10)
3-round QA session @ 2 papers/round	~¥1.5–3
5-round QA session @ 3 papers/round	~¥4–8

paper-distiller-chat ask enforces --max-cost-cny (default ¥20). The cost number is for the circuit breaker — not billing-accurate.

Customize the output

All 6 LLM prompts are plain markdown — edit them to change tone, structure, or output language. No Python changes needed.

src/paper_distiller/prompts/{filter,article,survey}.md — distill mode
src/paper_distiller/agents/prompts/route.md — intent router
src/paper_distiller/qa/prompts/{reflect,answer}.md — QA mode

Defaults produce Chinese-primary notes with this 5-section structure: 一句话 / 问题动因 / 方法 / 关键结果 / 我的 take.

Optional companion: semantic search via vault-mcp

paper-distiller does NOT ship its own semantic-search engine for your vault. To search by meaning (not keywords) from Claude Code, Cursor, or any MCP-aware agent, pair it with vault-mcp.

See docs/vault-mcp-recommendation.md for setup and rationale.

Status & roadmap

v1.0.0 — beta. Chat-first architecture stable; 168 tests passing on Python 3.10 / 3.11 / 3.12.

Migration from v0.5

v0.5.x	v1.0
`paper-distiller --topic X --n N`	`paper-distiller-chat distill --topic X --n N`
`paper-distiller-qa --question Y --max-rounds R`	`paper-distiller-chat ask --question Y --max-rounds R`
(no resume command)	`paper-distiller-chat resume --session-id <sid>`
(no interactive mode)	`paper-distiller-chat` (no subcommand)

Flag names and defaults are otherwise preserved. See CHANGELOG for full details.

Coming

v1.1 — citation-graph traversal: given a seed article, follow references / cited-by edges and rank them for inclusion.
v1.2 — broaden sources beyond arxiv + SS: integrate browser-session scraping for ACM, IEEE, 知乎 etc.
Later — per-vault paper-distiller.toml for custom category schemas; LEANN in-pipeline crosslink retrieval for vaults > 500 entries.

Known limitations

arxiv.org and Semantic Scholar occasionally rate-limit (HTTP 429); QA sessions exit with error: search failed (resumable via paper-distiller-chat resume <sid>).
Scanned-only PDFs fall through to abstract-only mode (PyMuPDF doesn't OCR — by design we flag rather than silently distill wrong text).

Contributing

Issues and PRs welcome.

git clone https://github.com/jesson-hh/paper-distiller
cd paper-distiller
pip install -e ".[dev]"
pytest -v

CI runs the same matrix on every PR. For a tour of the codebase, see docs/ARCHITECTURE.md.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

xuyi1030jesson

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.12.0

May 21, 2026

1.11.0

May 21, 2026

1.10.0

May 21, 2026

1.9.0

May 21, 2026

1.8.0

May 21, 2026

1.7.0

May 20, 2026

1.6.1

May 20, 2026

1.6.0

May 20, 2026

1.5.0

May 19, 2026

1.4.0

May 19, 2026

This version

1.3.0

May 19, 2026

1.2.0

May 19, 2026

1.1.0

May 19, 2026

1.0.0

May 19, 2026

0.5.1

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_distiller-1.3.0.tar.gz (886.8 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paper_distiller-1.3.0-py3-none-any.whl (81.4 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file paper_distiller-1.3.0.tar.gz.

File metadata

Download URL: paper_distiller-1.3.0.tar.gz
Upload date: May 19, 2026
Size: 886.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paper_distiller-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`9f0db0ffb2d5787fd2210e613099d20641b7691a06430ac9db73e72b3bb18849`
MD5	`e168ff091417ff222dd3dfb4110ba458`
BLAKE2b-256	`717a977a899958a1cf2ba52957fed057f6d903a6e9b0ebbdd83ef5b1427bbea9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paper_distiller-1.3.0.tar.gz:

Publisher: release.yml on jesson-hh/paper-distiller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paper_distiller-1.3.0.tar.gz
- Subject digest: 9f0db0ffb2d5787fd2210e613099d20641b7691a06430ac9db73e72b3bb18849
- Sigstore transparency entry: 1573084840
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: jesson-hh/paper-distiller@7b51a271aec9451f7e0dcce22baf76755f9ff8b0
- Branch / Tag: refs/tags/v1.3.0
- Owner: https://github.com/jesson-hh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7b51a271aec9451f7e0dcce22baf76755f9ff8b0
- Trigger Event: push

File details

Details for the file paper_distiller-1.3.0-py3-none-any.whl.

File metadata

Download URL: paper_distiller-1.3.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 81.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paper_distiller-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5ab2611dd5ced056d35dceb8ce42a5779536a02c0b18ee58a0e75893263ba19`
MD5	`63f484994c52c17315916d628e0dfd29`
BLAKE2b-256	`518853f81b956833c447fbad8e29354fc902a33a79c5b718a6f2781370df35ce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paper_distiller-1.3.0-py3-none-any.whl:

Publisher: release.yml on jesson-hh/paper-distiller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paper_distiller-1.3.0-py3-none-any.whl
- Subject digest: b5ab2611dd5ced056d35dceb8ce42a5779536a02c0b18ee58a0e75893263ba19
- Sigstore transparency entry: 1573084848
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: jesson-hh/paper-distiller@7b51a271aec9451f7e0dcce22baf76755f9ff8b0
- Branch / Tag: refs/tags/v1.3.0
- Owner: https://github.com/jesson-hh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7b51a271aec9451f7e0dcce22baf76755f9ff8b0
- Trigger Event: push

paper-distiller 1.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

paper-distiller

Install

Configure

Provider quick reference

Use it

Interactive REPL (recommended)

One-shot mode (good for scripts / cron)

Helpful flags

What you get — a sample distilled article

Vault layout

How it works

Cost

Customize the output

Optional companion: semantic search via vault-mcp

Status & roadmap

Migration from v0.5

Coming

Known limitations

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance