Skip to main content

Unified paper search across 20+ academic sources

Project description

Paper Search Library

Unified search across 20+ academic sources: ArXiv, PubMed, Semantic Scholar, Google Scholar, SSRN, bioRxiv, and more.

Features

  • 🔍 Multi-source search - Search across 20+ academic databases simultaneously
  • ⚙️ Robust error handling - Automatic retries, rate limiting, timeout handling
  • 📥 PDF downloads - Download papers from multiple sources with fallback chains
  • 🛡️ Production-ready - Built from real-world trading system experience
  • 📦 Easy integration - Simple API, minimal dependencies

Quick Start

Installation

pip install paper-search-lib

Basic Usage

from paper_search import PaperSearch
from paper_search.connectors import ArxivConnector

# Create searcher
searcher = PaperSearch(connectors=[ArxivConnector()])

# Search
papers = searcher.search("machine learning", max_results=10)

# Use results
for paper in papers:
    print(f"{paper.title}")
    print(f"Authors: {', '.join(paper.authors)}")
    print(f"URL: {paper.url}\n")

Robust Search (With Error Handling)

from paper_search import RobustSearch
from paper_search.connectors import ArxivConnector

# Create robust searcher
robust = RobustSearch(
    connectors=[ArxivConnector()],
    min_delay=10,        # Wait 10s between requests
    max_retries=3,       # Retry 3 times on failure
    timeout=90,          # 90 second timeout (not 30!)
    use_fallback=True    # Try other sources if one fails
)

# Search with automatic retries and proper delays
result = robust.search("changepoint detection", max_results=20)
print(f"Found {result.total_found} papers")
print(f"Successful sources: {result.successful_sources}")
print(f"Failed sources: {result.failed_sources}")

Multiple Queries

queries = [
    "changepoint detection",
    "Bayesian inference",
    "regime switching"
]

results = robust.search_multiple(queries, max_results=10)

for query, result in results.items():
    print(f"\n{query}: {result.total_found} papers")
    for paper in result.papers:
        print(f"  - {paper.title[:60]}...")

Available Sources

  • ArXiv (✅ Ready)
  • PubMed (Coming soon)
  • Semantic Scholar (Coming soon)
  • Google Scholar (Coming soon)
  • bioRxiv (Coming soon)
  • SSRN (Coming soon)
  • And 14+ more...

Documentation & Roadmap

Quick Links

Implementation Roadmap

Key Design Decisions

Rate Limiting

  • Default delay: 10 seconds between requests
  • Why: ArXiv allows ~1 request per 3 seconds; 10s is conservative and safe
  • Configurable: Adjust via min_delay parameter

Timeout Handling

  • Default timeout: 90 seconds (not 30!)
  • Why: Academic servers can be slow; 30s is too aggressive
  • Strategy: Retry up to 3 times with exponential backoff

Error Handling

  • Rate limit (429): Wait 30s and retry
  • Timeout: Wait 5-10s and retry
  • Server error (503): Skip to next source (server is busy)
  • Result: >90% success rate instead of ~30%

Multi-source

  • Try all configured sources
  • Continue even if one fails
  • Return combined results
  • Track which sources succeeded/failed

Contributing

Contributions welcome! Areas to help:

  • Add new source connectors
  • Improve error handling
  • Add caching layer
  • Performance optimizations
  • Documentation improvements

License

MIT License - See LICENSE file for details

Acknowledgments

Built from experience with:


Status: Alpha (v0.1.0)
Last Updated: April 2, 2026
Stability: Production-ready for ArXiv; other sources coming

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_search_lib-1.1.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paper_search_lib-1.1.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file paper_search_lib-1.1.0.tar.gz.

File metadata

  • Download URL: paper_search_lib-1.1.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for paper_search_lib-1.1.0.tar.gz
Algorithm Hash digest
SHA256 45c6b5c2931983470d921245100a4ce331f5f792b26f8a2894e377212c58abec
MD5 5dba29aae31c5a501076a461ec176315
BLAKE2b-256 c082fd961b37c90f093935304b4abb00567484408679ce16ddd97c413c37a6da

See more details on using hashes here.

File details

Details for the file paper_search_lib-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for paper_search_lib-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d59f607262501063dca82395dc9d42c309aa6f37780f6817db5a2a120b54149e
MD5 50a83919708aec80bd3bef12e7f28156
BLAKE2b-256 7cfdf844fc249a0798dbc36915cd8b748cc067224612fd4108c9b303e815df77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page