Blazing fast ArXiv paper search — 928K papers in 40ms

These details have not been verified by PyPI

Project links

Project description

arxiv-search-kit

Offline ArXiv paper search over 928K CS papers. SPECTER2 embeddings + LanceDB vector index + BM25 hybrid retrieval.

40ms per search on GPU. 99% precision@10. No API keys. No rate limits.

Install

pip install arxiv-search-kit[gpu]   # with CUDA
pip install arxiv-search-kit[cpu]   # CPU only

Quick start

from arxiv_search_kit import ArxivClient

client = ArxivClient()  # auto-downloads 4GB index on first run

# keyword search
papers = client.search("attention mechanism transformers", categories=["cs.CL", "cs.LG"])

# find related papers
related = client.find_related("1706.03762")  # Attention Is All You Need

# search with context paper (biases results toward your paper's neighborhood)
papers = client.search(
    "self-supervised learning",
    context_paper_id="2010.11929",  # ViT
)

# batch search (returns all unique papers across queries)
papers = client.batch_search([
    "vision transformers",
    "neural radiance fields",
    "RLHF alignment",
], max_results=10)

# conference-aware search
papers = client.search("object detection", conference="CVPR", year=2024)

# sort by citations (calls Semantic Scholar API)
papers = client.search("diffusion models", sort_by="citations", min_citations=50)

What you get back

paper = papers[0]
paper.arxiv_id      # "2401.12345"
paper.title          # "..."
paper.abstract       # "..."
paper.authors        # [Author(name="...", affiliation="..."), ...]
paper.categories     # ["cs.CV", "cs.LG"]
paper.published      # datetime
paper.pdf_url        # "https://arxiv.org/pdf/2401.12345"
paper.to_bibtex()    # BibTeX string

Citation graph (via Semantic Scholar)

citations = client.get_citations("1706.03762")
references = client.get_references("1706.03762")

# enrich search results with citation counts
client.enrich(papers)
papers[0].citation_count  # 95421

Coverage

928K papers across all major CS + stat.ML categories:

cs.CV (144K), cs.LG (129K), cs.CL (78K), cs.AI (36K), cs.RO (38K), cs.CR (32K), stat.ML (20K), and 40+ more subcategories.

Maps conferences to categories: CVPR, NeurIPS, ICML, ICLR, ACL, EMNLP, AAAI, CHI, KDD, SIGIR, and more.

How it works

Pre-built index: 928K papers embedded with SPECTER2, stored in LanceDB
At query time: embed query with SPECTER2, hybrid retrieval (vector + BM25), graph-based re-ranking via Personalized PageRank
Index auto-downloads from HuggingFace on first use (~4GB)

Building your own index

Only needed if you want to customize the paper set or update to latest papers.

pip install arxiv-search-kit[index]

# download metadata from ArXiv OAI-PMH (takes ~2 hours)
python -m arxiv_search_kit.scripts.build_index download --output metadata.jsonl

# build index (needs GPU, takes ~45 min)
python -m arxiv_search_kit.scripts.build_index all --output-dir ./my_index --device cuda

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Apr 11, 2026

0.2.3

Apr 11, 2026

0.2.2

Apr 11, 2026

0.2.1

Apr 11, 2026

0.2.0

Apr 9, 2026

0.1.9

Apr 7, 2026

0.1.8

Apr 6, 2026

0.1.7

Apr 6, 2026

0.1.5

Apr 3, 2026

0.1.4

Apr 3, 2026

0.1.3

Apr 3, 2026

0.1.2

Mar 28, 2026

This version

0.1.1

Mar 24, 2026

0.1.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_search_kit-0.1.1.tar.gz (35.0 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arxiv_search_kit-0.1.1-py3-none-any.whl (44.8 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file arxiv_search_kit-0.1.1.tar.gz.

File metadata

Download URL: arxiv_search_kit-0.1.1.tar.gz
Upload date: Mar 24, 2026
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for arxiv_search_kit-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d5e57fadb6d78cb10cf6ae90cc6c3d9663760cb0abe9c152d4a42705041b9daf`
MD5	`bbf03ac9f43d2e432171e81a9c7606b5`
BLAKE2b-256	`0483f4758ee4311f19c2bc419f0766f9f4d91cea7956d39346fdae6e3680fcd3`

See more details on using hashes here.

File details

Details for the file arxiv_search_kit-0.1.1-py3-none-any.whl.

File metadata

Download URL: arxiv_search_kit-0.1.1-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 44.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for arxiv_search_kit-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6edd7c0f8326e8adeebcfe66c693ffbbab147eced1baf779f6bb7fd928e799ea`
MD5	`54f8f2d1a97b5ce2da67fe2de188575d`
BLAKE2b-256	`936ba2bbcec9e4f6e3eb568607737ad8902bfac12b163c3e5ac0bfd7a36b45ac`

See more details on using hashes here.

arxiv-search-kit 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

arxiv-search-kit

Install

Quick start

What you get back

Citation graph (via Semantic Scholar)

Coverage

How it works

Building your own index

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes