A multi-purpose tool for searching, downloading, and managing academic papers
Project description
paperscout
A multi-purpose tool for searching, downloading, and managing academic papers.
Features
- Search papers across multiple sources using pluggable backends
- Download papers by DOI or arXiv ID
- Programmatic API and CLI interface
- Backend architecture for easy addition of new sources
- Smart title matching - searches across title and abstract
Installation
pip install paperscout
With all backend support
pip install paperscout[arxiv]
# or install all optional dependencies
pip install paperscout[all]
Available backend extras:
arxiv- arXiv backend using arxivydblp- DBLP backend using dblpclis2cli- Semantic Scholar backend using s2cliacl-anthology- ACL Anthology backend
Backend Architecture
The package uses a backend architecture where each source has its own backend implementation:
paperscout/
├── backends/
│ ├── base.py # Base backend class (abstract)
│ ├── arxiv.py # arXiv backend using arxivy
│ ├── dblp.py # DBLP backend using dblpcli
│ ├── s2cli.py # Semantic Scholar backend using s2cli
│ └── acl_anthology.py # ACL Anthology backend
├── similarity.py # Title similarity functions
├── search.py # Searcher using backends
├── client.py # Client using searchers
└── formatter.py # Rich formatting for CLI output
To add a new backend:
- Create a new module in
backends/(e.g.,pubmed.py) - Implement the
BaseBackendabstract class - Register it in
PaperSearcher.BACKENDS
CLI Usage
Commands
| Command | Description |
|---|---|
search |
Search for papers across sources |
match |
Get the best matching paper by title |
download |
Download a paper |
Search Papers
# Search all backends (default - ACL Anthology has priority)
paperscout search "Attention Is All You Need"
# Search specific source
paperscout search "Transformer" --source arxiv
paperscout search "Attention" --source acl_anthology
# Limit results
paperscout search "quantum computing" --limit 20
# Output as JSON
paperscout search "query" --json
# Save to file
paperscout search "query" --output results.txt
Get Best Match
# Get the single best matching paper
paperscout match "Attention Is All You Need"
# Specify source
paperscout match "Transformer" --source arxiv
# JSON output
paperscout match "query" --json
Download Paper
The identifier type is auto-detected — no need to specify --source:
# Download by arXiv ID (auto-detected)
paperscout download arXiv:2301.12345
# Download by bare arXiv ID
paperscout download 2301.12345
# Download by DOI (auto-detected, tries multiple backends)
paperscout download 10.18653/v1/P18-1001
# Download by ACL Anthology ID
paperscout download P18-1001
# Specify source explicitly (if needed)
paperscout download P18-1001 --source acl_anthology
# Specify output directory for the PDF
paperscout download arXiv:2301.12345 --output ./papers
# JSON output
paperscout download arXiv:2301.12345 --json
Python API
Using the Client
from paperscout import PaperFinderClient
# Create client
client = PaperFinderClient()
# Search for papers (ACL Anthology has priority)
results = client.search("Attention Is All You Need", source="all", limit=10)
for paper in results:
print(f"{paper['title']} - {paper['authors']}")
# Search specific source only
results = client.search("Transformer", source="arxiv", limit=5)
# Download a paper (identifier type auto-detected)
client.download("arXiv:2301.12345", output_dir="./papers")
client.download("10.18653/v1/P18-1001") # DOI
client.download("P18-1001", source="acl_anthology") # ACL Anthology ID
Using Backends Directly
from paperscout.backends.acl_anthology import ACLAnthologyBackend
from paperscout.backends.arxiv import ArxivBackend
from paperscout.backends.dblp import DblpBackend
from paperscout.backends.s2cli import SemanticScholarBackend
# Use ACL Anthology backend
backend = ACLAnthologyBackend()
results = backend.search("Attention Is All You Need", limit=10)
# Use arXiv backend
arxiv_backend = ArxivBackend()
papers = arxiv_backend.search("quantum computing", limit=5)
# Download from any backend
backend.download("arXiv:2301.12345")
Searching by Similarity
from paperscout.search import PaperSearcher
from paperscout.similarity import _title_similarity
searcher = PaperSearcher()
# Search all backends with exact match priority
results = searcher.search("Transformer", exact_match_first=True)
# Calculate title similarity
similarity = _title_similarity(
"Attention Is All You Need",
"Attention Is All You Need"
)
print(similarity) # 1.0 for exact match
Sources
| Source | Backend | Description |
|---|---|---|
all |
Multiple | Searches all backends (ACL Anthology has priority) |
arxiv |
ArxivBackend | Preprints in physics, math, CS, and more |
dblp |
DblpBackend | Computer science bibliography |
semantic_scholar |
SemanticScholarBackend | Academic paper search |
acl_anthology |
ACLAnthologyBackend | ACL/EMNLP/NAACL papers |
BibTeX Support
Export Papers as BibTeX
# Search and export as BibTeX
paperscout search "Attention Is All You Need" --bibtex
# Save to file
paperscout match "Transformer" --bibtex --output citations.bib
from paperscout import Paper
paper = Paper(...)
bibtex = paper.to_bibtex()
# or with custom key
bibtex = paper.to_bibtex(entry_key="vaswani2017")
Verify BibTeX Files
Verify that papers in a BibTeX file actually exist in academic databases:
# Basic verification
paperscout verify references.bib
# With specific backends
paperscout verify references.bib --source arxiv,dblp
# Enrich entries with discovered metadata
paperscout verify references.bib --enrich
# Output annotated .bib and JSON report
paperscout verify references.bib --output verified.bib --json report.json
from paperscout import PaperFinderClient
client = PaperFinderClient()
results = client.verify_bibtex("references.bib")
for result in results:
print(f"{result.bib_key}: {result.verdict} ({result.confidence:.0%})")
The verifier outputs:
- Console report: Rich table showing verification status for each entry
- Annotated .bib: Original file with verification comments
- JSON report: Structured data for downstream processing
CLI Output
The CLI uses rich formatting for better readability:
- Tables for multiple papers showing source, year, title, and authors
- Panels for single paper details with full metadata
- Color coding for better visual separation
- Truncation for long titles and abstracts
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Ivan Vykopal - ivan.vykopal@gmail.com
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Acknowledgments
- Uses arxivy for arXiv API interactions
- Uses acl-anthology for ACL Anthology data
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paperscout-0.2.1.tar.gz.
File metadata
- Download URL: paperscout-0.2.1.tar.gz
- Upload date:
- Size: 40.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3abe8300bd29a483b382b1be2b4ea733cbf0bd37bc7c2fca91da3c6af162e3d
|
|
| MD5 |
835031d7a139e179c0d85d2936cccc1a
|
|
| BLAKE2b-256 |
30172f0efac932168334b6cdbb13b0cd8184808a2601c918db0f3c397ce72cb4
|
Provenance
The following attestation bundles were made for paperscout-0.2.1.tar.gz:
Publisher:
publish.yml on ivanvykopal/paperscout
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paperscout-0.2.1.tar.gz -
Subject digest:
f3abe8300bd29a483b382b1be2b4ea733cbf0bd37bc7c2fca91da3c6af162e3d - Sigstore transparency entry: 1683852813
- Sigstore integration time:
-
Permalink:
ivanvykopal/paperscout@5aaddff965933369442e96525f8999445bc0d36c -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/ivanvykopal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5aaddff965933369442e96525f8999445bc0d36c -
Trigger Event:
release
-
Statement type:
File details
Details for the file paperscout-0.2.1-py3-none-any.whl.
File metadata
- Download URL: paperscout-0.2.1-py3-none-any.whl
- Upload date:
- Size: 36.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd4e6579b9193e020f09746cd4615111efc372e24f3bca037f4e72ce80aea009
|
|
| MD5 |
11c4d30db2e70adef8388126dbee2590
|
|
| BLAKE2b-256 |
7e8df09724666f66cee9dc4775926b3aa0a4cc829e70f90c40f4ece8770ce665
|
Provenance
The following attestation bundles were made for paperscout-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on ivanvykopal/paperscout
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paperscout-0.2.1-py3-none-any.whl -
Subject digest:
dd4e6579b9193e020f09746cd4615111efc372e24f3bca037f4e72ce80aea009 - Sigstore transparency entry: 1683853003
- Sigstore integration time:
-
Permalink:
ivanvykopal/paperscout@5aaddff965933369442e96525f8999445bc0d36c -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/ivanvykopal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5aaddff965933369442e96525f8999445bc0d36c -
Trigger Event:
release
-
Statement type: