Unified paper search across 20+ academic sources
Project description
Paper Search Library
Unified search across 20+ academic sources: ArXiv, PubMed, Semantic Scholar, Google Scholar, SSRN, bioRxiv, and more.
Features
- 🔍 Multi-source search - Search across 20+ academic databases simultaneously
- ⚙️ Robust error handling - Automatic retries, rate limiting, timeout handling
- 📥 PDF downloads - Download papers from multiple sources with fallback chains
- 🛡️ Production-ready - Built from real-world trading system experience
- 📦 Easy integration - Simple API, minimal dependencies
Quick Start
Installation
pip install paper-search-lib
Basic Usage
from paper_search import PaperSearch
from paper_search.connectors import ArxivConnector
# Create searcher
searcher = PaperSearch(connectors=[ArxivConnector()])
# Search
papers = searcher.search("machine learning", max_results=10)
# Use results
for paper in papers:
print(f"{paper.title}")
print(f"Authors: {', '.join(paper.authors)}")
print(f"URL: {paper.url}\n")
Robust Search (With Error Handling)
from paper_search import RobustSearch
from paper_search.connectors import ArxivConnector
# Create robust searcher
robust = RobustSearch(
connectors=[ArxivConnector()],
min_delay=10, # Wait 10s between requests
max_retries=3, # Retry 3 times on failure
timeout=90, # 90 second timeout (not 30!)
use_fallback=True # Try other sources if one fails
)
# Search with automatic retries and proper delays
result = robust.search("changepoint detection", max_results=20)
print(f"Found {result.total_found} papers")
print(f"Successful sources: {result.successful_sources}")
print(f"Failed sources: {result.failed_sources}")
Multiple Queries
queries = [
"changepoint detection",
"Bayesian inference",
"regime switching"
]
results = robust.search_multiple(queries, max_results=10)
for query, result in results.items():
print(f"\n{query}: {result.total_found} papers")
for paper in result.papers:
print(f" - {paper.title[:60]}...")
Available Sources
- ArXiv (✅ Ready)
- PubMed (Coming soon)
- Semantic Scholar (Coming soon)
- Google Scholar (Coming soon)
- bioRxiv (Coming soon)
- SSRN (Coming soon)
- And 14+ more...
Documentation & Roadmap
Quick Links
- API Reference - Class and method documentation
- Available Sources - Current & planned sources
- Examples - Usage examples
- Troubleshooting - Common issues
Implementation Roadmap
- PHASE_2_PLAN.md - Phase 2: Publish to PyPI (2-3 hours)
- PHASE_3_LONG_TERM.md - Phase 3+: 20+ sources & advanced features (12+ weeks)
Key Design Decisions
Rate Limiting
- Default delay: 10 seconds between requests
- Why: ArXiv allows ~1 request per 3 seconds; 10s is conservative and safe
- Configurable: Adjust via
min_delayparameter
Timeout Handling
- Default timeout: 90 seconds (not 30!)
- Why: Academic servers can be slow; 30s is too aggressive
- Strategy: Retry up to 3 times with exponential backoff
Error Handling
- Rate limit (429): Wait 30s and retry
- Timeout: Wait 5-10s and retry
- Server error (503): Skip to next source (server is busy)
- Result: >90% success rate instead of ~30%
Multi-source
- Try all configured sources
- Continue even if one fails
- Return combined results
- Track which sources succeeded/failed
Contributing
Contributions welcome! Areas to help:
- Add new source connectors
- Improve error handling
- Add caching layer
- Performance optimizations
- Documentation improvements
License
MIT License - See LICENSE file for details
Acknowledgments
Built from experience with:
Status: Alpha (v0.1.0)
Last Updated: April 2, 2026
Stability: Production-ready for ArXiv; other sources coming
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paper_search_lib-1.1.0.tar.gz.
File metadata
- Download URL: paper_search_lib-1.1.0.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45c6b5c2931983470d921245100a4ce331f5f792b26f8a2894e377212c58abec
|
|
| MD5 |
5dba29aae31c5a501076a461ec176315
|
|
| BLAKE2b-256 |
c082fd961b37c90f093935304b4abb00567484408679ce16ddd97c413c37a6da
|
File details
Details for the file paper_search_lib-1.1.0-py3-none-any.whl.
File metadata
- Download URL: paper_search_lib-1.1.0-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d59f607262501063dca82395dc9d42c309aa6f37780f6817db5a2a120b54149e
|
|
| MD5 |
50a83919708aec80bd3bef12e7f28156
|
|
| BLAKE2b-256 |
7cfdf844fc249a0798dbc36915cd8b748cc067224612fd4108c9b303e815df77
|