Automated literature search, screening, and prioritization pipeline powered by LLMs.
Project description
litscout
Automated literature search, screening, and prioritization pipeline powered by LLMs.
What It Does
litscout is an automated literature discovery and screening pipeline for academic researchers. It uses AI to:
- Generate smart search queries based on your research angle
- Search academic databases (OpenAlex, Semantic Scholar, arXiv, PubMed, CORE) for candidate papers
- Download PDFs (with Elsevier ScienceDirect API fallback for paywalled papers)
- Screen papers using an LLM for relevance to your research angle
- Keep medium/high relevance papers, discard the rest
- Repeat until sufficient coverage is achieved
- Generate a final Markdown report summarizing everything found
Installation
From PyPI (coming soon)
pip install litscout
From Source
git clone https://github.com/your-username/litscout.git
cd litscout
pip install -e .
Quick Start
1. Initialize a Project
litscout init
This scaffolds a new litscout project in your current directory with all the necessary config files and directories.
2. Configure API Keys
# Edit .env with your API keys
# At minimum: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL
3. Configure Sources
# Edit input/settings.yaml to enable your sources
# At least one source with role 'search_and_pdf' must be enabled
4. Write Research Angle
# Edit input/research.md with your research focus
5. Run the Pipeline
litscout run
6. Check Results
ls output/kept_papers/ # Downloaded PDFs
ls output/reports/ # Final Markdown reports
cat output/manifest.json # Full log of all papers
CLI Commands
| Command | Description |
|---|---|
litscout init |
Scaffold a new litscout project directory |
litscout run |
Run the literature search and screening pipeline |
litscout report |
Regenerate the markdown report from existing manifest.json |
litscout clean |
Clean output and temp directories |
litscout status |
Show quick summary of current project state |
litscout --help |
Show help message and exit |
litscout init
Scaffolds a new litscout project in the current directory (or specified path):
litscout init # Use current directory
litscout init ./my-project # Use specified directory
Creates:
input/research.md— Your research angleinput/settings.yaml— Source and target settings.env— API keysconfig.yaml— Advanced technical settingsoutput/,temp/— Output and temp directories
litscout run
Runs the full literature search and screening pipeline:
litscout run # Use default config.yaml
litscout run --config my.yaml # Use custom config path
litscout report
Regenerates the markdown report from an existing manifest.json without re-running the pipeline:
litscout report
litscout report --config my.yaml
litscout clean
Cleans output and temp directories:
litscout clean # Ask for confirmation
litscout clean --confirm # Skip confirmation
litscout status
Shows a quick summary of the current project state:
litscout status
litscout status --config my.yaml
Output includes:
- Active sources and their roles
- Target papers
- Iterations run
- Papers kept (high/medium breakdown)
- Papers discarded
- Last updated timestamp
Configuring Search Sources
Edit input/settings.yaml to enable the sources you have access to:
| Source | Role | Key Required? | Coverage | Best For |
|---|---|---|---|---|
| OpenAlex | Search + PDF | No (email optional) | 250M+ works, all disciplines | General academic research |
| Semantic Scholar | Search + PDF | No (optional for speed) | 200M+ papers, AI-ranked | CS, biomedical, broad coverage |
| Elsevier | PDF only | Yes (institutional) | Paywalled Elsevier journals | University-subscribed content |
| arXiv | Search + PDF | No | 2.4M+ preprints | Physics, math, CS, quantitative biology |
| PubMed | Search + PDF | No (optional for speed) | 36M+ citations | Biomedical and life sciences |
| CORE | Search + PDF | Yes (free) | 300M+ metadata, 40M+ full texts | Open access aggregation |
Default enabled sources: OpenAlex (search+pdf), Elsevier (pdf-only)
Configuration Files
.env (API Keys)
Required:
LLM_BASE_URL— OpenAI-compatible endpoint URLLLM_API_KEY— Your LLM API keyLLM_MODEL— Model name (e.g.,qwen3.6-plus)
Optional (enable sources as needed):
OPENALEX_EMAIL— For faster OpenAlex rate limitsS2_API_KEY— Semantic Scholar (optional, for guaranteed 1 req/sec)ELSEVIER_API_KEY— Elsevier ScienceDirect (required for pdf_only role)ELSEVIER_INST_TOKEN— Elsevier institutional token (for off-campus access)PUBMED_API_KEY— PubMed (optional, for 10 req/sec vs 3 req/sec)CORE_API_KEY— CORE (required if enabled)
input/settings.yaml (User Settings)
target_papers: 20 # Stop when this many papers are kept
max_iterations: 0 # 0 = unlimited
auto_stop: false # true = stop automatically; false = ask user
sources:
openalex:
enabled: true
role: search_and_pdf
semantic_scholar:
enabled: false
role: search_and_pdf
elsevier:
enabled: true
role: pdf_only
arxiv:
enabled: false
role: search_and_pdf
pubmed:
enabled: false
role: search_and_pdf
core:
enabled: false
role: search_and_pdf
config.yaml (Technical Settings — rarely needs editing)
| Setting | Description | Default |
|---|---|---|
api.max_tokens |
Max tokens for LLM responses | 16384 |
api.temperature |
LLM temperature (0.0-1.0) | 0.3 |
api.max_concurrent_requests |
Concurrent LLM requests | 3 |
search.queries_per_iteration |
Queries per round | 5 |
search.results_per_query |
Max results per query | 20 |
search.year_range |
Papers from last N years | 5 |
download.concurrency |
Max simultaneous downloads | 5 |
download.timeout |
Download timeout (seconds) | 60 |
download.max_pdf_size_mb |
Skip PDFs larger than this | 50 |
screening.batch_size |
Papers per LLM screening call | 10 |
screening.max_tokens_per_batch |
Token budget per batch | 200000 |
sufficiency.min_high_relevance |
Min high-relevance papers | 5 |
sufficiency.min_medium_relevance |
Min medium-relevance papers | 8 |
Adding New Sources
litscout uses a plugin-based source architecture. To add a new source:
- Create a new file in
litscout/sources/(e.g.,my_source.py) - Subclass
ScholarSourcefromlitscout.sources.base - Implement the required methods:
name()— Return the source identifiersearch(query, limit, year_min, credentials)— Search for papersfetch_pdf(paper, credentials, session)— Fetch PDF content
- Register it in
litscout/sources/__init__.py
Example:
from litscout.sources.base import PaperMetadata, ScholarSource
class MySource(ScholarSource):
@classmethod
def name(cls) -> str:
return "my_source"
async def search(self, query, limit, year_min, credentials):
# Implement search logic
return []
async def fetch_pdf(self, paper, credentials, session):
# Implement PDF fetch logic
return None
Output Format
The final report is a Markdown file with:
- Header: Generation timestamp, iteration count, paper statistics
- Research Angle: Your original research prompt
- Summary Table: All kept papers with relevance and brief descriptions
- Detailed Evaluations: Full analysis for each kept paper
- Coverage Analysis: Gaps identified by the LLM
- Search Queries Used: All queries across all iterations
Graceful Shutdown
Press Ctrl+C to gracefully stop the pipeline. It will:
- Finish the current iteration
- Save the manifest
- Generate a final report
- Clean up temporary files
Note on Language Support
Currently supports English-language papers only. Japanese language support is planned for a future release.
License
MIT License — see LICENSE for details.
Acknowledgments
- Semantic Scholar (Allen Institute for AI) — Free academic paper search API
- OpenAlex — Open bibliographic database
- Elsevier — ScienceDirect API for paywalled paper access
- arXiv — Open-access preprint repository
- PubMed / NCBI — Biomedical literature database
- CORE — Open access aggregator
Contributing
Contributions are welcome! See CONTRIBUTING.md for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litscout-0.2.0.tar.gz.
File metadata
- Download URL: litscout-0.2.0.tar.gz
- Upload date:
- Size: 54.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8e72acc3d543cb6e49ab61e45d06c80ba31959707410ca38237bf26b8959493
|
|
| MD5 |
80337274de7e919f932884616375f84e
|
|
| BLAKE2b-256 |
aa227003d5c97c1d79557818cc04369635c25665d63dfc1dd5aa1ffa4ddf4010
|
File details
Details for the file litscout-0.2.0-py3-none-any.whl.
File metadata
- Download URL: litscout-0.2.0-py3-none-any.whl
- Upload date:
- Size: 61.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a80f830eaa107f48a0d17a47c1bfd14ca7691641a092226d3df5a5061d08b2c3
|
|
| MD5 |
e1fbf96b472dbd78ac82801b291825dc
|
|
| BLAKE2b-256 |
a34ea82fc36d432a64eb58b8733bd4decb3219bd6e8b21157bfe4cf77bdd73e1
|