Skip to main content

Unified paper search across 20+ academic sources

Project description

Paper Search Library

Unified search across 20+ academic sources: ArXiv, PubMed, Semantic Scholar, Google Scholar, SSRN, bioRxiv, and more.

Features

  • 🔍 Multi-source search - Search across 20+ academic databases simultaneously
  • ⚙️ Robust error handling - Automatic retries, rate limiting, timeout handling
  • 📥 PDF downloads - Download papers from multiple sources with fallback chains
  • 🛡️ Production-ready - Built from real-world trading system experience
  • 📦 Easy integration - Simple API, minimal dependencies

Quick Start

Installation

pip install paper-search-lib

Basic Usage

from paper_search import PaperSearch
from paper_search.connectors import ArxivConnector

# Create searcher
searcher = PaperSearch(connectors=[ArxivConnector()])

# Search
papers = searcher.search("machine learning", max_results=10)

# Use results
for paper in papers:
    print(f"{paper.title}")
    print(f"Authors: {', '.join(paper.authors)}")
    print(f"URL: {paper.url}\n")

Robust Search (With Error Handling)

from paper_search import RobustSearch
from paper_search.connectors import ArxivConnector

# Create robust searcher
robust = RobustSearch(
    connectors=[ArxivConnector()],
    min_delay=10,        # Wait 10s between requests
    max_retries=3,       # Retry 3 times on failure
    timeout=90,          # 90 second timeout (not 30!)
    use_fallback=True    # Try other sources if one fails
)

# Search with automatic retries and proper delays
result = robust.search("changepoint detection", max_results=20)
print(f"Found {result.total_found} papers")
print(f"Successful sources: {result.successful_sources}")
print(f"Failed sources: {result.failed_sources}")

Multiple Queries

queries = [
    "changepoint detection",
    "Bayesian inference",
    "regime switching"
]

results = robust.search_multiple(queries, max_results=10)

for query, result in results.items():
    print(f"\n{query}: {result.total_found} papers")
    for paper in result.papers:
        print(f"  - {paper.title[:60]}...")

Available Sources

  • ArXiv (✅ Ready)
  • PubMed (Coming soon)
  • Semantic Scholar (Coming soon)
  • Google Scholar (Coming soon)
  • bioRxiv (Coming soon)
  • SSRN (Coming soon)
  • And 14+ more...

Documentation & Roadmap

Quick Links

Implementation Roadmap

Key Design Decisions

Rate Limiting

  • Default delay: 10 seconds between requests
  • Why: ArXiv allows ~1 request per 3 seconds; 10s is conservative and safe
  • Configurable: Adjust via min_delay parameter

Timeout Handling

  • Default timeout: 90 seconds (not 30!)
  • Why: Academic servers can be slow; 30s is too aggressive
  • Strategy: Retry up to 3 times with exponential backoff

Error Handling

  • Rate limit (429): Wait 30s and retry
  • Timeout: Wait 5-10s and retry
  • Server error (503): Skip to next source (server is busy)
  • Result: >90% success rate instead of ~30%

Multi-source

  • Try all configured sources
  • Continue even if one fails
  • Return combined results
  • Track which sources succeeded/failed

Contributing

Contributions welcome! Areas to help:

  • Add new source connectors
  • Improve error handling
  • Add caching layer
  • Performance optimizations
  • Documentation improvements

License

MIT License - See LICENSE file for details

Acknowledgments

Built from experience with:


Status: Alpha (v0.1.0)
Last Updated: April 2, 2026
Stability: Production-ready for ArXiv; other sources coming

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_search_lib-1.0.0.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paper_search_lib-1.0.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file paper_search_lib-1.0.0.tar.gz.

File metadata

  • Download URL: paper_search_lib-1.0.0.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for paper_search_lib-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3ff53272e678909b4d3fac72ef0d983e953bab75c8edc7cd31edcf52148ebbfb
MD5 d1988a51ad276bad376a53adebe0b85f
BLAKE2b-256 7317bd4cfbe7075a21bef0a780bf8227d16513e91105f32cda90c747e4c04552

See more details on using hashes here.

File details

Details for the file paper_search_lib-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for paper_search_lib-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 325eae0082d7b2991c28f235a01c07661951bab14ddf0faaff297c71247c28a0
MD5 f535bbad1f1e900453c0847706cd26a2
BLAKE2b-256 edb3372c1f667561725ecb161610e8bda7036be64a94c0afb9d6717694fba60e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page