Skip to main content

A multi-purpose tool for searching, downloading, and managing academic papers

Project description

paperscout

A multi-purpose tool for searching, downloading, and managing academic papers.

PyPI version Python versions License: MIT

Features

  • Search papers across multiple sources using pluggable backends
  • Download papers by DOI or arXiv ID
  • Programmatic API and CLI interface
  • Backend architecture for easy addition of new sources
  • Smart title matching - searches across title and abstract

Installation

pip install paperscout

With all backend support

pip install paperscout[arxiv]
# or install all optional dependencies
pip install paperscout[all]

Available backend extras:

  • arxiv - arXiv backend using arxivy
  • dblp - DBLP backend using dblpcli
  • s2cli - Semantic Scholar backend using s2cli
  • acl-anthology - ACL Anthology backend

Backend Architecture

The package uses a backend architecture where each source has its own backend implementation:

paperscout/
├── backends/
│   ├── base.py              # Base backend class (abstract)
│   ├── arxiv.py             # arXiv backend using arxivy
│   ├── dblp.py              # DBLP backend using dblpcli
│   ├── s2cli.py             # Semantic Scholar backend using s2cli
│   └── acl_anthology.py     # ACL Anthology backend
├── similarity.py            # Title similarity functions
├── search.py                # Searcher using backends
├── client.py                # Client using searchers
└── formatter.py             # Rich formatting for CLI output

To add a new backend:

  1. Create a new module in backends/ (e.g., pubmed.py)
  2. Implement the BaseBackend abstract class
  3. Register it in PaperSearcher.BACKENDS

CLI Usage

Commands

Command Description
search Search for papers across sources
match Get the best matching paper by title
download Download a paper

Search Papers

# Search all backends (default - ACL Anthology has priority)
paperscout search "Attention Is All You Need"

# Search specific source
paperscout search "Transformer" --source arxiv
paperscout search "Attention" --source acl_anthology

# Limit results
paperscout search "quantum computing" --limit 20

# Output as JSON
paperscout search "query" --json

# Save to file
paperscout search "query" --output results.txt

Get Best Match

# Get the single best matching paper
paperscout match "Attention Is All You Need"

# Specify source
paperscout match "Transformer" --source arxiv

# JSON output
paperscout match "query" --json

Download Paper

# Download by arXiv ID
paperscout download arXiv:2301.12345

# Download by DOI
paperscout download 10.1234/example.doi

# Specify output directory
paperscout download arXiv:2301.12345 --output ./papers

# JSON output
paperscout download arXiv:2301.12345 --json

Python API

Using the Client

from paperscout import PaperFinderClient

# Create client
client = PaperFinderClient()

# Search for papers (ACL Anthology has priority)
results = client.search("Attention Is All You Need", source="all", limit=10)
for paper in results:
    print(f"{paper['title']} - {paper['authors']}")

# Search specific source only
results = client.search("Transformer", source="arxiv", limit=5)

# Download a paper
client.download("arXiv:2301.12345", output_dir="./papers")

Using Backends Directly

from paperscout.backends.acl_anthology import ACLAnthologyBackend
from paperscout.backends.arxiv import ArxivBackend
from paperscout.backends.dblp import DblpBackend
from paperscout.backends.s2cli import SemanticScholarBackend

# Use ACL Anthology backend
backend = ACLAnthologyBackend()
results = backend.search("Attention Is All You Need", limit=10)

# Use arXiv backend
arxiv_backend = ArxivBackend()
papers = arxiv_backend.search("quantum computing", limit=5)

# Download from any backend
backend.download("arXiv:2301.12345")

Searching by Similarity

from paperscout.search import PaperSearcher
from paperscout.similarity import _title_similarity

searcher = PaperSearcher()

# Search all backends with exact match priority
results = searcher.search("Transformer", exact_match_first=True)

# Calculate title similarity
similarity = _title_similarity(
    "Attention Is All You Need",
    "Attention Is All You Need"
)
print(similarity)  # 1.0 for exact match

Sources

Source Backend Description
all Multiple Searches all backends (ACL Anthology has priority)
arxiv ArxivBackend Preprints in physics, math, CS, and more
dblp DblpBackend Computer science bibliography
semantic_scholar SemanticScholarBackend Academic paper search
acl_anthology ACLAnthologyBackend ACL/EMNLP/NAACL papers

CLI Output

The CLI uses rich formatting for better readability:

  • Tables for multiple papers showing source, year, title, and authors
  • Panels for single paper details with full metadata
  • Color coding for better visual separation
  • Truncation for long titles and abstracts

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Ivan Vykopal - ivan.vykopal@gmail.com

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperscout-0.1.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paperscout-0.1.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file paperscout-0.1.0.tar.gz.

File metadata

  • Download URL: paperscout-0.1.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paperscout-0.1.0.tar.gz
Algorithm Hash digest
SHA256 58f71cb05af54dcccc97a18b96bb3f56de01a14c9eae681da8f9cde4db872240
MD5 238dbb404f2d420b952174c8e6812287
BLAKE2b-256 6ab374615c08a8cc1b3627cf8c19b86ac8adec1b28ab47b7bf92c7122147ad79

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.1.0.tar.gz:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file paperscout-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: paperscout-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paperscout-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3126b12985c0d280bf21d49cfcd3c4bdf61ee14cb01ca3f758860af54931bcb3
MD5 90f4bd4fea3d1dd3d1227b4beefbf764
BLAKE2b-256 d6502f724d771dda569003b745c04372685378219771cb7281f4313d5d3bcf9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page