CLI tool for automated literature research workflows.
Project description
litresearch
CLI tool that automates literature research from research questions to curated, ranked, and exported paper sets with structured reports.
Overview
- Generates search facets and academic queries from one or more research questions
- Discovers candidates from Semantic Scholar and OpenAlex
- Screens and analyzes papers with an LLM through LiteLLM
- Supports citation graph expansion for frequently referenced works
- Ranks papers and exports reports, references, JSON data, PDFs, and metrics
- Supports robust resume via a saved
state.json
What's New in v1.0.0
Multi-source discovery (S2 + OpenAlex)
- Use
discovery_sources = ["s2", "openalex"]for broader coverage. - Candidates are deduplicated across sources and source provenance is tracked.
Citation graph expansion
- Optional expansion stage adds highly cross-referenced papers after ranking.
- Configure with
expand_citationsandmin_cross_refs.
Zotero export
- Export top papers to Zotero user or group libraries.
- Supports collection assignment, tags, and PDF attachment when available.
PDF injection
- Bring your own PDFs with
--inject-pdfsorinject_pdf_dir. - Match files by
{paper_id}.pdfor DOI-based filenames.
Run metrics and telemetry
- Every run writes
metrics.jsonwith stage timings and aggregate counts. - Includes source breakdown plus PDF availability and usage metrics.
Resume behavior improvements
- Improved resume reliability from
state.jsoncheckpoints. - Safer state persistence with atomic writes.
Token-budgeted PDF extraction
- Configurable extraction strategy supports token budgets for LLM context limits.
- Falls back gracefully when PDFs are unavailable or extraction is limited.
Installation
uv pip install litresearch
For local development:
uv sync
uv run nox
Quickstart
- Set an LLM API key for a LiteLLM-supported provider:
export OPENAI_API_KEY=your_key_here
# or
export ANTHROPIC_API_KEY=your_key_here
- Optionally set a Semantic Scholar key for better rate limits:
export S2_API_KEY=your_key_here
- Copy the example config and tune defaults:
cp litresearch.toml.example litresearch.toml
- Run the pipeline:
litresearch run "What is the impact of large language models on software engineering?"
- Inspect the output directory:
output/
report.md
paper_analyses.md
references.bib
references.ris
data.json
metrics.json
papers/
state.json
Usage
Run one or more research questions:
litresearch run \
"How do large language models affect developer productivity?" \
"What evidence exists about code quality impacts?"
Override settings from the CLI:
litresearch run \
"How do LLMs affect software engineering?" \
--model anthropic/claude-sonnet-4-20250514 \
--top-n 10 \
--threshold 50 \
--output-dir runs/llm-se \
--overwrite
Resume an interrupted run:
litresearch resume output/state.json
Inject local PDFs for papers you already have:
litresearch run "Your research question" --inject-pdfs /path/to/pdfs
Inspect current configuration:
litresearch config
Configuration
Settings load in this order:
- CLI flags
- Environment variables
litresearch.toml- Built-in defaults
Supported environment variables:
OPENAI_API_KEYANTHROPIC_API_KEYOPENROUTER_API_KEYS2_API_KEYZOTERO_API_KEYS2_TIMEOUTS2_REQUESTS_PER_SECONDSCREENING_SELECTION_MODESCREENING_TOP_PERCENTSCREENING_TOP_KSCREENING_THRESHOLD
Start from the full example config:
cp litresearch.toml.example litresearch.toml
Key options include:
default_model = "openai/gpt-4o-mini"
llm_timeout = 120
max_retries = 3
retry_base_delay = 1.0
discovery_sources = ["s2"]
screening_selection_mode = "top_percent"
screening_top_percent = 0.3
screening_threshold = 60
top_n = 20
max_results_per_query = 20
expand_citations = false
min_cross_refs = 3
zotero_export = false
s2_timeout = 10
s2_requests_per_second = 1.0
pdf_extraction_mode = "budget"
pdf_token_budget = 4000
pdf_first_pages = 4
pdf_last_pages = 2
abstract_fallback = true
# inject_pdf_dir = "/path/to/pdfs"
output_dir = "output"
Screening selection modes:
top_percent(default): deep-analyze the top share of screened papers globallytop_k: deep-analyze the top K screened papers globallythreshold: deep-analyze papers scoring>= screening_threshold
Semantic Scholar tuning:
s2_timeout: request timeout in secondss2_requests_per_second: global request rate cap across S2 endpoints
Discovery tuning:
discovery_sources: chooses2,openalex, or bothopenalex_email: optional email for OpenAlex polite pool rate limits
Citation expansion tuning:
expand_citations: enable or disable expansion stagemin_cross_refs: minimum citation graph references to include
Zotero export tuning:
zotero_export: enable export integrationzotero_library_id,zotero_library_type,zotero_collection_key,zotero_tag
Output Files
report.md: main literature review report with research questions, search summary, top papers, and synthesispaper_analyses.md: detailed per-paper analysis for all analyzed papersreferences.bib: BibTeX for ranked papers when citation data is availablereferences.ris: RIS export for citation managersdata.json: machine-readable export of the pipeline statemetrics.json: per-stage timings and aggregate run metricspapers/: downloaded open-access PDFs for ranked papersstate.json: resumable pipeline checkpoint
Development
uv run nox
uv run litresearch --help
Status
v1.0.0 delivers a production-ready core workflow for automated literature research,
including multi-source discovery, ranking, export, and operational telemetry.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litresearch-1.0.0.tar.gz.
File metadata
- Download URL: litresearch-1.0.0.tar.gz
- Upload date:
- Size: 182.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
771a12b8fbb7d2bd314c515629ac6286ac36d906042b53c75cd5400745793f56
|
|
| MD5 |
f9fdd2b24e014d1f7c0c2adaf9b70879
|
|
| BLAKE2b-256 |
933ba03cef8a8a83728a907ea0624bb1d4fe945115cf3af442ac09220d51005c
|
Provenance
The following attestation bundles were made for litresearch-1.0.0.tar.gz:
Publisher:
release.yml on spignotti/litresearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
litresearch-1.0.0.tar.gz -
Subject digest:
771a12b8fbb7d2bd314c515629ac6286ac36d906042b53c75cd5400745793f56 - Sigstore transparency entry: 1277680703
- Sigstore integration time:
-
Permalink:
spignotti/litresearch@b6ec2df442cda369ae2ca48900f6afbdc090d0c5 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/spignotti
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b6ec2df442cda369ae2ca48900f6afbdc090d0c5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file litresearch-1.0.0-py3-none-any.whl.
File metadata
- Download URL: litresearch-1.0.0-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9836a8ffdb42de9835e9b169fcc78bd74f6227d0a753c8f9f6ed819374cd4d8
|
|
| MD5 |
3843a181d1887e98970fd8428052e2df
|
|
| BLAKE2b-256 |
791a15189d36060f2d520c2c18ed0bced9ad8b59eb03531b511be8dc8b95dac8
|
Provenance
The following attestation bundles were made for litresearch-1.0.0-py3-none-any.whl:
Publisher:
release.yml on spignotti/litresearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
litresearch-1.0.0-py3-none-any.whl -
Subject digest:
a9836a8ffdb42de9835e9b169fcc78bd74f6227d0a753c8f9f6ed819374cd4d8 - Sigstore transparency entry: 1277680725
- Sigstore integration time:
-
Permalink:
spignotti/litresearch@b6ec2df442cda369ae2ca48900f6afbdc090d0c5 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/spignotti
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b6ec2df442cda369ae2ca48900f6afbdc090d0c5 -
Trigger Event:
push
-
Statement type: