Skip to main content

Automated literature search, screening, and prioritization pipeline powered by LLMs.

Project description

litscout

Python 3.11+ License Status

Automated literature search, screening, and prioritization pipeline powered by LLMs.

What It Does

litscout is an automated literature discovery and screening pipeline for academic researchers. It uses AI to:

  1. Generate smart search queries based on your research angle
  2. Search academic databases (OpenAlex, Semantic Scholar, arXiv, PubMed, CORE) for candidate papers
  3. Download PDFs (with Elsevier ScienceDirect API fallback for paywalled papers)
  4. Screen papers using an LLM for relevance to your research angle
  5. Keep medium/high relevance papers, discard the rest
  6. Repeat until sufficient coverage is achieved
  7. Generate a final Markdown report summarizing everything found

Installation

From PyPI (coming soon)

pip install litscout

From Source

git clone https://github.com/your-username/litscout.git
cd litscout
pip install -e .

Quick Start

1. Initialize a Project

litscout init

This scaffolds a new litscout project in your current directory with all the necessary config files and directories.

2. Configure API Keys

# Edit .env with your API keys
# At minimum: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL

3. Configure Sources

# Edit input/settings.yaml to enable your sources
# At least one source with role 'search_and_pdf' must be enabled

4. Write Research Angle

# Edit input/research.md with your research focus

5. Run the Pipeline

litscout run

6. Check Results

ls output/kept_papers/   # Downloaded PDFs
ls output/reports/       # Final Markdown reports
cat output/manifest.json # Full log of all papers

CLI Commands

Command Description
litscout init Scaffold a new litscout project directory
litscout run Run the literature search and screening pipeline
litscout report Regenerate the markdown report from existing manifest.json
litscout clean Clean output and temp directories
litscout status Show quick summary of current project state
litscout --help Show help message and exit

litscout init

Scaffolds a new litscout project in the current directory (or specified path):

litscout init              # Use current directory
litscout init ./my-project # Use specified directory

Creates:

  • input/research.md — Your research angle
  • input/settings.yaml — Source and target settings
  • .env — API keys
  • config.yaml — Advanced technical settings
  • output/, temp/ — Output and temp directories

litscout run

Runs the full literature search and screening pipeline:

litscout run                    # Use default config.yaml
litscout run --config my.yaml   # Use custom config path

litscout report

Regenerates the markdown report from an existing manifest.json without re-running the pipeline:

litscout report
litscout report --config my.yaml

litscout clean

Cleans output and temp directories:

litscout clean           # Ask for confirmation
litscout clean --confirm # Skip confirmation

litscout status

Shows a quick summary of the current project state:

litscout status
litscout status --config my.yaml

Output includes:

  • Active sources and their roles
  • Target papers
  • Iterations run
  • Papers kept (high/medium breakdown)
  • Papers discarded
  • Last updated timestamp

Configuring Search Sources

Edit input/settings.yaml to enable the sources you have access to:

Source Role Key Required? Coverage Best For
OpenAlex Search + PDF No (email optional) 250M+ works, all disciplines General academic research
Semantic Scholar Search + PDF No (optional for speed) 200M+ papers, AI-ranked CS, biomedical, broad coverage
Elsevier PDF only Yes (institutional) Paywalled Elsevier journals University-subscribed content
arXiv Search + PDF No 2.4M+ preprints Physics, math, CS, quantitative biology
PubMed Search + PDF No (optional for speed) 36M+ citations Biomedical and life sciences
CORE Search + PDF Yes (free) 300M+ metadata, 40M+ full texts Open access aggregation

Default enabled sources: OpenAlex (search+pdf), Elsevier (pdf-only)

Configuration Files

.env (API Keys)

Required:

  • LLM_BASE_URL — OpenAI-compatible endpoint URL
  • LLM_API_KEY — Your LLM API key
  • LLM_MODEL — Model name (e.g., qwen3.6-plus)

Optional (enable sources as needed):

  • OPENALEX_EMAIL — For faster OpenAlex rate limits
  • S2_API_KEY — Semantic Scholar (optional, for guaranteed 1 req/sec)
  • ELSEVIER_API_KEY — Elsevier ScienceDirect (required for pdf_only role)
  • ELSEVIER_INST_TOKEN — Elsevier institutional token (for off-campus access)
  • PUBMED_API_KEY — PubMed (optional, for 10 req/sec vs 3 req/sec)
  • CORE_API_KEY — CORE (required if enabled)

input/settings.yaml (User Settings)

target_papers: 20          # Stop when this many papers are kept
max_iterations: 0          # 0 = unlimited
auto_stop: false           # true = stop automatically; false = ask user

sources:
  openalex:
    enabled: true
    role: search_and_pdf
  semantic_scholar:
    enabled: false
    role: search_and_pdf
  elsevier:
    enabled: true
    role: pdf_only
  arxiv:
    enabled: false
    role: search_and_pdf
  pubmed:
    enabled: false
    role: search_and_pdf
  core:
    enabled: false
    role: search_and_pdf

config.yaml (Technical Settings — rarely needs editing)

Setting Description Default
api.max_tokens Max tokens for LLM responses 16384
api.temperature LLM temperature (0.0-1.0) 0.3
api.max_concurrent_requests Concurrent LLM requests 3
search.queries_per_iteration Queries per round 5
search.results_per_query Max results per query 20
search.year_range Papers from last N years 5
download.concurrency Max simultaneous downloads 5
download.timeout Download timeout (seconds) 60
download.max_pdf_size_mb Skip PDFs larger than this 50
screening.batch_size Papers per LLM screening call 10
screening.max_tokens_per_batch Token budget per batch 200000
sufficiency.min_high_relevance Min high-relevance papers 5
sufficiency.min_medium_relevance Min medium-relevance papers 8

Adding New Sources

litscout uses a plugin-based source architecture. To add a new source:

  1. Create a new file in litscout/sources/ (e.g., my_source.py)
  2. Subclass ScholarSource from litscout.sources.base
  3. Implement the required methods:
    • name() — Return the source identifier
    • search(query, limit, year_min, credentials) — Search for papers
    • fetch_pdf(paper, credentials, session) — Fetch PDF content
  4. Register it in litscout/sources/__init__.py

Example:

from litscout.sources.base import PaperMetadata, ScholarSource

class MySource(ScholarSource):
    @classmethod
    def name(cls) -> str:
        return "my_source"

    async def search(self, query, limit, year_min, credentials):
        # Implement search logic
        return []

    async def fetch_pdf(self, paper, credentials, session):
        # Implement PDF fetch logic
        return None

Output Format

The final report is a Markdown file with:

  1. Header: Generation timestamp, iteration count, paper statistics
  2. Research Angle: Your original research prompt
  3. Summary Table: All kept papers with relevance and brief descriptions
  4. Detailed Evaluations: Full analysis for each kept paper
  5. Coverage Analysis: Gaps identified by the LLM
  6. Search Queries Used: All queries across all iterations

Graceful Shutdown

Press Ctrl+C to gracefully stop the pipeline. It will:

  1. Finish the current iteration
  2. Save the manifest
  3. Generate a final report
  4. Clean up temporary files

Note on Language Support

Currently supports English-language papers only. Japanese language support is planned for a future release.

License

MIT License — see LICENSE for details.

Acknowledgments

  • Semantic Scholar (Allen Institute for AI) — Free academic paper search API
  • OpenAlex — Open bibliographic database
  • Elsevier — ScienceDirect API for paywalled paper access
  • arXiv — Open-access preprint repository
  • PubMed / NCBI — Biomedical literature database
  • CORE — Open access aggregator

Contributing

Contributions are welcome! See CONTRIBUTING.md for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litscout-0.2.0.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litscout-0.2.0-py3-none-any.whl (61.4 kB view details)

Uploaded Python 3

File details

Details for the file litscout-0.2.0.tar.gz.

File metadata

  • Download URL: litscout-0.2.0.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for litscout-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f8e72acc3d543cb6e49ab61e45d06c80ba31959707410ca38237bf26b8959493
MD5 80337274de7e919f932884616375f84e
BLAKE2b-256 aa227003d5c97c1d79557818cc04369635c25665d63dfc1dd5aa1ffa4ddf4010

See more details on using hashes here.

File details

Details for the file litscout-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: litscout-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 61.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for litscout-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a80f830eaa107f48a0d17a47c1bfd14ca7691641a092226d3df5a5061d08b2c3
MD5 e1fbf96b472dbd78ac82801b291825dc
BLAKE2b-256 a34ea82fc36d432a64eb58b8733bd4decb3219bd6e8b21157bfe4cf77bdd73e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page