A multi-purpose tool for searching, downloading, and managing academic papers

These details have not been verified by PyPI

Project links

Project description

paperscout

A multi-purpose tool for searching, downloading, and managing academic papers.

Features

Search papers across multiple sources using pluggable backends
Download papers by DOI or arXiv ID
Programmatic API and CLI interface
Backend architecture for easy addition of new sources
Smart title matching - searches across title and abstract

Installation

pip install paperscout

With all backend support

pip install paperscout[arxiv]
# or install all optional dependencies
pip install paperscout[all]

Available backend extras:

arxiv - arXiv backend using arxivy
dblp - DBLP backend using dblpcli
s2cli - Semantic Scholar backend using s2cli
acl-anthology - ACL Anthology backend

Backend Architecture

The package uses a backend architecture where each source has its own backend implementation:

paperscout/
├── backends/
│   ├── base.py              # Base backend class (abstract)
│   ├── arxiv.py             # arXiv backend using arxivy
│   ├── dblp.py              # DBLP backend using dblpcli
│   ├── s2cli.py             # Semantic Scholar backend using s2cli
│   └── acl_anthology.py     # ACL Anthology backend
├── similarity.py            # Title similarity functions
├── search.py                # Searcher using backends
├── client.py                # Client using searchers
└── formatter.py             # Rich formatting for CLI output

To add a new backend:

Create a new module in backends/ (e.g., pubmed.py)
Implement the BaseBackend abstract class
Register it in PaperSearcher.BACKENDS

CLI Usage

Commands

Command	Description
`search`	Search for papers across sources
`match`	Get the best matching paper by title
`download`	Download a paper

Search Papers

# Search all backends (default - ACL Anthology has priority)
paperscout search "Attention Is All You Need"

# Search specific source
paperscout search "Transformer" --source arxiv
paperscout search "Attention" --source acl_anthology

# Limit results
paperscout search "quantum computing" --limit 20

# Output as JSON
paperscout search "query" --json

# Save to file
paperscout search "query" --output results.txt

Get Best Match

# Get the single best matching paper
paperscout match "Attention Is All You Need"

# Specify source
paperscout match "Transformer" --source arxiv

# JSON output
paperscout match "query" --json

Download Paper

The identifier type is auto-detected — no need to specify --source:

# Download by arXiv ID (auto-detected)
paperscout download arXiv:2301.12345

# Download by bare arXiv ID
paperscout download 2301.12345

# Download by DOI (auto-detected, tries multiple backends)
paperscout download 10.18653/v1/P18-1001

# Download by ACL Anthology ID
paperscout download P18-1001

# Specify source explicitly (if needed)
paperscout download P18-1001 --source acl_anthology

# Specify output directory for the PDF
paperscout download arXiv:2301.12345 --output ./papers

# JSON output
paperscout download arXiv:2301.12345 --json

Python API

Using the Client

from paperscout import PaperFinderClient

# Create client
client = PaperFinderClient()

# Search for papers (ACL Anthology has priority)
results = client.search("Attention Is All You Need", source="all", limit=10)
for paper in results:
    print(f"{paper['title']} - {paper['authors']}")

# Search specific source only
results = client.search("Transformer", source="arxiv", limit=5)

# Download a paper (identifier type auto-detected)
client.download("arXiv:2301.12345", output_dir="./papers")
client.download("10.18653/v1/P18-1001")        # DOI
client.download("P18-1001", source="acl_anthology")  # ACL Anthology ID

Using Backends Directly

from paperscout.backends.acl_anthology import ACLAnthologyBackend
from paperscout.backends.arxiv import ArxivBackend
from paperscout.backends.dblp import DblpBackend
from paperscout.backends.s2cli import SemanticScholarBackend

# Use ACL Anthology backend
backend = ACLAnthologyBackend()
results = backend.search("Attention Is All You Need", limit=10)

# Use arXiv backend
arxiv_backend = ArxivBackend()
papers = arxiv_backend.search("quantum computing", limit=5)

# Download from any backend
backend.download("arXiv:2301.12345")

Searching by Similarity

from paperscout.search import PaperSearcher
from paperscout.similarity import _title_similarity

searcher = PaperSearcher()

# Search all backends with exact match priority
results = searcher.search("Transformer", exact_match_first=True)

# Calculate title similarity
similarity = _title_similarity(
    "Attention Is All You Need",
    "Attention Is All You Need"
)
print(similarity)  # 1.0 for exact match

Sources

Source	Backend	Description
`all`	Multiple	Searches all backends (ACL Anthology has priority)
`arxiv`	ArxivBackend	Preprints in physics, math, CS, and more
`dblp`	DblpBackend	Computer science bibliography
`semantic_scholar`	SemanticScholarBackend	Academic paper search
`acl_anthology`	ACLAnthologyBackend	ACL/EMNLP/NAACL papers

BibTeX Support

Export Papers as BibTeX

# Search and export as BibTeX
paperscout search "Attention Is All You Need" --bibtex

# Save to file
paperscout match "Transformer" --bibtex --output citations.bib

from paperscout import Paper

paper = Paper(...)
bibtex = paper.to_bibtex()
# or with custom key
bibtex = paper.to_bibtex(entry_key="vaswani2017")

Verify BibTeX Files

Verify that papers in a BibTeX file actually exist in academic databases:

# Basic verification
paperscout verify references.bib

# With specific backends
paperscout verify references.bib --source arxiv,dblp

# Enrich entries with discovered metadata
paperscout verify references.bib --enrich

# Output annotated .bib and JSON report
paperscout verify references.bib --output verified.bib --json report.json

from paperscout import PaperFinderClient

client = PaperFinderClient()
results = client.verify_bibtex("references.bib")

for result in results:
    print(f"{result.bib_key}: {result.verdict} ({result.confidence:.0%})")

The verifier outputs:

Console report: Rich table showing verification status for each entry
Annotated .bib: Original file with verification comments
JSON report: Structured data for downstream processing

CLI Output

The CLI uses rich formatting for better readability:

Tables for multiple papers showing source, year, title, and authors
Panels for single paper details with full metadata
Color coding for better visual separation
Truncation for long titles and abstracts

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Ivan Vykopal - ivan.vykopal@gmail.com

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Acknowledgments

Uses arxivy for arXiv API interactions
Uses acl-anthology for ACL Anthology data

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Jul 15, 2026

0.2.2

Jun 1, 2026

0.2.1

May 31, 2026

0.2.0

May 29, 2026

0.1.2

Mar 1, 2026

0.1.1

Mar 1, 2026

0.1.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperscout-0.3.0.tar.gz (54.7 kB view details)

Uploaded Jul 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paperscout-0.3.0-py3-none-any.whl (45.1 kB view details)

Uploaded Jul 15, 2026 Python 3

File details

Details for the file paperscout-0.3.0.tar.gz.

File metadata

Download URL: paperscout-0.3.0.tar.gz
Upload date: Jul 15, 2026
Size: 54.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paperscout-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`06f2274bda15ab70ad2e65d4befa1115ade87f84b2bbb2f1ebb73f684f0907bc`
MD5	`1ebaf7cbd349ef3afcc92b2725b49255`
BLAKE2b-256	`dbd89c450ba9fccf170bc893227f088fc7e1c579ba62d12ed2771792daff56eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.3.0.tar.gz:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paperscout-0.3.0.tar.gz
- Subject digest: 06f2274bda15ab70ad2e65d4befa1115ade87f84b2bbb2f1ebb73f684f0907bc
- Sigstore transparency entry: 2172882036
- Sigstore integration time: Jul 15, 2026
Source repository:
- Permalink: ivanvykopal/paperscout@b222fd308cb05d297229991439f4bae05e10e622
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/ivanvykopal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b222fd308cb05d297229991439f4bae05e10e622
- Trigger Event: workflow_dispatch

File details

Details for the file paperscout-0.3.0-py3-none-any.whl.

File metadata

Download URL: paperscout-0.3.0-py3-none-any.whl
Upload date: Jul 15, 2026
Size: 45.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paperscout-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`644e4260d0d63c8272da9deffd9fac82abf20ba87590ae2d70fcd05118b50b73`
MD5	`7a57bae7d648ebd4453a5f37446cc4d9`
BLAKE2b-256	`4bb75a44cddd20e6cb7fc48e0944eece2149776d7f3ef60c00306255ab719a42`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperscout-0.3.0-py3-none-any.whl:

Publisher: publish.yml on ivanvykopal/paperscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paperscout-0.3.0-py3-none-any.whl
- Subject digest: 644e4260d0d63c8272da9deffd9fac82abf20ba87590ae2d70fcd05118b50b73
- Sigstore transparency entry: 2172882054
- Sigstore integration time: Jul 15, 2026
Source repository:
- Permalink: ivanvykopal/paperscout@b222fd308cb05d297229991439f4bae05e10e622
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/ivanvykopal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b222fd308cb05d297229991439f4bae05e10e622
- Trigger Event: workflow_dispatch

paperscout 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

paperscout

Features

Installation

With all backend support

Backend Architecture

CLI Usage

Commands

Search Papers

Get Best Match

Download Paper

Python API

Using the Client

Using Backends Directly

Searching by Similarity

Sources

BibTeX Support

Export Papers as BibTeX

Verify BibTeX Files

CLI Output

License

Author

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance