Skip to main content

A multi-purpose tool for searching, downloading, and managing academic papers

Project description

paperscout

A multi-purpose tool for searching, downloading, and managing academic papers.

PyPI version Python versions License: MIT

Features

  • Search papers across multiple sources using pluggable backends
  • Download papers by DOI or arXiv ID
  • Programmatic API and CLI interface
  • Backend architecture for easy addition of new sources
  • Smart title matching - searches across title and abstract

Installation

pip install paperscout

With all backend support

pip install paperscout[arxiv]
# or install all optional dependencies
pip install paperscout[all]

Available backend extras:

  • arxiv - arXiv backend using arxivy
  • dblp - DBLP backend using dblpcli
  • s2cli - Semantic Scholar backend using s2cli
  • acl-anthology - ACL Anthology backend

Backend Architecture

The package uses a backend architecture where each source has its own backend implementation:

paperscout/
├── backends/
│   ├── base.py              # Base backend class (abstract)
│   ├── arxiv.py             # arXiv backend using arxivy
│   ├── dblp.py              # DBLP backend using dblpcli
│   ├── s2cli.py             # Semantic Scholar backend using s2cli
│   └── acl_anthology.py     # ACL Anthology backend
├── similarity.py            # Title similarity functions
├── search.py                # Searcher using backends
├── client.py                # Client using searchers
└── formatter.py             # Rich formatting for CLI output

To add a new backend:

  1. Create a new module in backends/ (e.g., pubmed.py)
  2. Implement the BaseBackend abstract class
  3. Register it in PaperSearcher.BACKENDS

CLI Usage

Commands

Command Description
search Search for papers across sources
match Get the best matching paper by title
download Download a paper

Search Papers

# Search all backends (default - ACL Anthology has priority)
paperscout search "Attention Is All You Need"

# Search specific source
paperscout search "Transformer" --source arxiv
paperscout search "Attention" --source acl_anthology

# Limit results
paperscout search "quantum computing" --limit 20

# Output as JSON
paperscout search "query" --json

# Save to file
paperscout search "query" --output results.txt

Get Best Match

# Get the single best matching paper
paperscout match "Attention Is All You Need"

# Specify source
paperscout match "Transformer" --source arxiv

# JSON output
paperscout match "query" --json

Download Paper

# Download by arXiv ID
paperscout download arXiv:2301.12345

# Download by DOI
paperscout download 10.1234/example.doi

# Specify output directory
paperscout download arXiv:2301.12345 --output ./papers

# JSON output
paperscout download arXiv:2301.12345 --json

Python API

Using the Client

from paperscout import PaperFinderClient

# Create client
client = PaperFinderClient()

# Search for papers (ACL Anthology has priority)
results = client.search("Attention Is All You Need", source="all", limit=10)
for paper in results:
    print(f"{paper['title']} - {paper['authors']}")

# Search specific source only
results = client.search("Transformer", source="arxiv", limit=5)

# Download a paper
client.download("arXiv:2301.12345", output_dir="./papers")

Using Backends Directly

from paperscout.backends.acl_anthology import ACLAnthologyBackend
from paperscout.backends.arxiv import ArxivBackend
from paperscout.backends.dblp import DblpBackend
from paperscout.backends.s2cli import SemanticScholarBackend

# Use ACL Anthology backend
backend = ACLAnthologyBackend()
results = backend.search("Attention Is All You Need", limit=10)

# Use arXiv backend
arxiv_backend = ArxivBackend()
papers = arxiv_backend.search("quantum computing", limit=5)

# Download from any backend
backend.download("arXiv:2301.12345")

Searching by Similarity

from paperscout.search import PaperSearcher
from paperscout.similarity import _title_similarity

searcher = PaperSearcher()

# Search all backends with exact match priority
results = searcher.search("Transformer", exact_match_first=True)

# Calculate title similarity
similarity = _title_similarity(
    "Attention Is All You Need",
    "Attention Is All You Need"
)
print(similarity)  # 1.0 for exact match

Sources

Source Backend Description
all Multiple Searches all backends (ACL Anthology has priority)
arxiv ArxivBackend Preprints in physics, math, CS, and more
dblp DblpBackend Computer science bibliography
semantic_scholar SemanticScholarBackend Academic paper search
acl_anthology ACLAnthologyBackend ACL/EMNLP/NAACL papers

BibTeX Support

Export Papers as BibTeX

# Search and export as BibTeX
paperscout search "Attention Is All You Need" --bibtex

# Save to file
paperscout match "Transformer" --bibtex --output citations.bib
from paperscout import Paper

paper = Paper(...)
bibtex = paper.to_bibtex()
# or with custom key
bibtex = paper.to_bibtex(entry_key="vaswani2017")

Verify BibTeX Files

Verify that papers in a BibTeX file actually exist in academic databases:

# Basic verification
paperscout verify references.bib

# With specific backends
paperscout verify references.bib --source arxiv,dblp

# Enrich entries with discovered metadata
paperscout verify references.bib --enrich

# Output annotated .bib and JSON report
paperscout verify references.bib --output verified.bib --json report.json
from paperscout import PaperFinderClient

client = PaperFinderClient()
results = client.verify_bibtex("references.bib")

for result in results:
    print(f"{result.bib_key}: {result.verdict} ({result.confidence:.0%})")

The verifier outputs:

  • Console report: Rich table showing verification status for each entry
  • Annotated .bib: Original file with verification comments
  • JSON report: Structured data for downstream processing

CLI Output

The CLI uses rich formatting for better readability:

  • Tables for multiple papers showing source, year, title, and authors
  • Panels for single paper details with full metadata
  • Color coding for better visual separation
  • Truncation for long titles and abstracts

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Ivan Vykopal - ivan.vykopal@gmail.com

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperscout-0.2.0.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paperscout-0.2.0-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file paperscout-0.2.0.tar.gz.

File metadata

  • Download URL: paperscout-0.2.0.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paperscout-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4aab22d7b6f02f58421bd3b4f42b1a449fa299ea5f2e8276ecd5adedb74a9bcb
MD5 dab9b88fed675e9790c137899969f5c8
BLAKE2b-256 83eb34adf00eda873721f55d0da1aa79baaf8df8935b6db6af9f7ad121f9cda7

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.2.0.tar.gz:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file paperscout-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: paperscout-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paperscout-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f604f0c2533ad731a8855ee42af243494466689d142c760454211f198dc388d
MD5 b39c4dc311b7ab4430d1dc33de5441dc
BLAKE2b-256 07d20221d2eb5d90af76be75967cdd4c6704a5cd4399377a341425e99996621a

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.2.0-py3-none-any.whl:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page