Skip to main content

A multi-purpose tool for searching, downloading, and managing academic papers

Project description

paperscout

A multi-purpose tool for searching, downloading, and managing academic papers.

PyPI version Python versions License: MIT

Features

  • Search papers across multiple sources using pluggable backends
  • Download papers by DOI or arXiv ID
  • Programmatic API and CLI interface
  • Backend architecture for easy addition of new sources
  • Smart title matching - searches across title and abstract

Installation

pip install paperscout

With all backend support

pip install paperscout[arxiv]
# or install all optional dependencies
pip install paperscout[all]

Available backend extras:

  • arxiv - arXiv backend using arxivy
  • dblp - DBLP backend using dblpcli
  • s2cli - Semantic Scholar backend using s2cli
  • acl-anthology - ACL Anthology backend

Backend Architecture

The package uses a backend architecture where each source has its own backend implementation:

paperscout/
├── backends/
│   ├── base.py              # Base backend class (abstract)
│   ├── arxiv.py             # arXiv backend using arxivy
│   ├── dblp.py              # DBLP backend using dblpcli
│   ├── s2cli.py             # Semantic Scholar backend using s2cli
│   └── acl_anthology.py     # ACL Anthology backend
├── similarity.py            # Title similarity functions
├── search.py                # Searcher using backends
├── client.py                # Client using searchers
└── formatter.py             # Rich formatting for CLI output

To add a new backend:

  1. Create a new module in backends/ (e.g., pubmed.py)
  2. Implement the BaseBackend abstract class
  3. Register it in PaperSearcher.BACKENDS

CLI Usage

Commands

Command Description
search Search for papers across sources
match Get the best matching paper by title
download Download a paper

Search Papers

# Search all backends (default - ACL Anthology has priority)
paperscout search "Attention Is All You Need"

# Search specific source
paperscout search "Transformer" --source arxiv
paperscout search "Attention" --source acl_anthology

# Limit results
paperscout search "quantum computing" --limit 20

# Output as JSON
paperscout search "query" --json

# Save to file
paperscout search "query" --output results.txt

Get Best Match

# Get the single best matching paper
paperscout match "Attention Is All You Need"

# Specify source
paperscout match "Transformer" --source arxiv

# JSON output
paperscout match "query" --json

Download Paper

The identifier type is auto-detected — no need to specify --source:

# Download by arXiv ID (auto-detected)
paperscout download arXiv:2301.12345

# Download by bare arXiv ID
paperscout download 2301.12345

# Download by DOI (auto-detected, tries multiple backends)
paperscout download 10.18653/v1/P18-1001

# Download by ACL Anthology ID
paperscout download P18-1001

# Specify source explicitly (if needed)
paperscout download P18-1001 --source acl_anthology

# Specify output directory for the PDF
paperscout download arXiv:2301.12345 --output ./papers

# JSON output
paperscout download arXiv:2301.12345 --json

Python API

Using the Client

from paperscout import PaperFinderClient

# Create client
client = PaperFinderClient()

# Search for papers (ACL Anthology has priority)
results = client.search("Attention Is All You Need", source="all", limit=10)
for paper in results:
    print(f"{paper['title']} - {paper['authors']}")

# Search specific source only
results = client.search("Transformer", source="arxiv", limit=5)

# Download a paper (identifier type auto-detected)
client.download("arXiv:2301.12345", output_dir="./papers")
client.download("10.18653/v1/P18-1001")        # DOI
client.download("P18-1001", source="acl_anthology")  # ACL Anthology ID

Using Backends Directly

from paperscout.backends.acl_anthology import ACLAnthologyBackend
from paperscout.backends.arxiv import ArxivBackend
from paperscout.backends.dblp import DblpBackend
from paperscout.backends.s2cli import SemanticScholarBackend

# Use ACL Anthology backend
backend = ACLAnthologyBackend()
results = backend.search("Attention Is All You Need", limit=10)

# Use arXiv backend
arxiv_backend = ArxivBackend()
papers = arxiv_backend.search("quantum computing", limit=5)

# Download from any backend
backend.download("arXiv:2301.12345")

Searching by Similarity

from paperscout.search import PaperSearcher
from paperscout.similarity import _title_similarity

searcher = PaperSearcher()

# Search all backends with exact match priority
results = searcher.search("Transformer", exact_match_first=True)

# Calculate title similarity
similarity = _title_similarity(
    "Attention Is All You Need",
    "Attention Is All You Need"
)
print(similarity)  # 1.0 for exact match

Sources

Source Backend Description
all Multiple Searches all backends (ACL Anthology has priority)
arxiv ArxivBackend Preprints in physics, math, CS, and more
dblp DblpBackend Computer science bibliography
semantic_scholar SemanticScholarBackend Academic paper search
acl_anthology ACLAnthologyBackend ACL/EMNLP/NAACL papers

BibTeX Support

Export Papers as BibTeX

# Search and export as BibTeX
paperscout search "Attention Is All You Need" --bibtex

# Save to file
paperscout match "Transformer" --bibtex --output citations.bib
from paperscout import Paper

paper = Paper(...)
bibtex = paper.to_bibtex()
# or with custom key
bibtex = paper.to_bibtex(entry_key="vaswani2017")

Verify BibTeX Files

Verify that papers in a BibTeX file actually exist in academic databases:

# Basic verification
paperscout verify references.bib

# With specific backends
paperscout verify references.bib --source arxiv,dblp

# Enrich entries with discovered metadata
paperscout verify references.bib --enrich

# Output annotated .bib and JSON report
paperscout verify references.bib --output verified.bib --json report.json
from paperscout import PaperFinderClient

client = PaperFinderClient()
results = client.verify_bibtex("references.bib")

for result in results:
    print(f"{result.bib_key}: {result.verdict} ({result.confidence:.0%})")

The verifier outputs:

  • Console report: Rich table showing verification status for each entry
  • Annotated .bib: Original file with verification comments
  • JSON report: Structured data for downstream processing

CLI Output

The CLI uses rich formatting for better readability:

  • Tables for multiple papers showing source, year, title, and authors
  • Panels for single paper details with full metadata
  • Color coding for better visual separation
  • Truncation for long titles and abstracts

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Ivan Vykopal - ivan.vykopal@gmail.com

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperscout-0.2.2.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paperscout-0.2.2-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file paperscout-0.2.2.tar.gz.

File metadata

  • Download URL: paperscout-0.2.2.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paperscout-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c7e7feeebebc6676d0f67e0a80afc796b24e827f1987fb8d6d916f274756ffed
MD5 74d3104622e9e6410366ded42859de8d
BLAKE2b-256 b2d133a25c5c55917f4e03d25b3d46ea4b5f65e72865437566d406b997b3436a

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.2.2.tar.gz:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file paperscout-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: paperscout-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 36.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paperscout-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 010c4f0d46338ffe0d89292aa21be4d022f1584135b70adf7dbbf905f341c1ef
MD5 9b37e622f9dcdbb42c19ca3034ed2603
BLAKE2b-256 426fe56f2d6eb32760a2b8729783d8009ee5c6c8e69b9e57947fdfd0ffb5df8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.2.2-py3-none-any.whl:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page