Skip to main content

Python wrapper for the arXiv API

Project description

arxiv.py

PyPI PyPI - Python Version GitHub Workflow Status (branch) Full package documentation

Python wrapper for the arXiv API.

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Install the package:

$ pip install arxiv   # Or `uv add arxiv` or similar.

In your Python code, include the line:

import arxiv

Examples

Fetching results

import arxiv

# Construct the default API client.
client = arxiv.Client()

# Search for the 10 most recent articles matching the keyword "quantum."
search = arxiv.Search(
  query = "quantum",
  max_results = 10,
  sort_by = arxiv.SortCriterion.SubmittedDate
)

results = client.results(search)

# `results` is a generator; you can iterate over its elements one by one...
for r in client.results(search):
  print(r.title)
# ...or exhaust it into a list. Careful: this is slow for large results sets.
all_results = list(results)
print([r.title for r in all_results])

# For advanced query syntax documentation, see the arXiv API User Manual:
# https://arxiv.org/help/api/user-manual#query_details
search = arxiv.Search(query = "au:del_maestro AND ti:checkerboard")
first_result = next(client.results(search))
print(first_result)

# Search for the paper with ID "1605.08386v1"
search_by_id = arxiv.Search(id_list=["1605.08386v1"])
# Reuse client to fetch the paper, then print its title.
first_result = next(client.results(search_by_id))
print(first_result.title)

[!TIP] arxivql may simplify constructing complex query strings.

Fetching results with a custom client

import arxiv

big_slow_client = arxiv.Client(
  page_size = 1000,
  delay_seconds = 10.0,
  num_retries = 5
)

# Prints 1000 titles before needing to make another request.
for result in big_slow_client.results(arxiv.Search(query="quantum")):
  print(result.title)

Downloading a paper

import arxiv
from urllib.request import urlretrieve

paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))

# Download the PDF.
urlretrieve(paper.pdf_url, "paper.pdf")

# Download the source tarball.
urlretrieve(paper.source_url(), "paper.tar.gz")

Logging

To inspect this package's network behavior and API logic, configure a DEBUG-level logger.

>>> import logging, arxiv
>>> logging.basicConfig(level=logging.DEBUG)
>>> client = arxiv.Client()
>>> paper = next(client.results(arxiv.Search(id_list=["1605.08386v1"])))
INFO:arxiv.arxiv:Requesting 100 results at offset 0
INFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443
DEBUG:urllib3.connectionpool:https://export.arxiv.org:443 "GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1" 200 979

Types

Client

A Client specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.

Clients configurations specify pagination and retry logic. Reusing a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.

Search

A Search specifies a search of arXiv's database. Use Client.results to get a generator yielding Results.

Result

The Result objects yielded by Client.results include metadata about each paper.

The meaning of the underlying raw data is documented in the arXiv API User Manual: Details of Atom Results Returned.

Development

This project uses UV for development, while maintaining compatibility with traditional pip installation for end users.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv-4.0.0.tar.gz (198.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv-4.0.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file arxiv-4.0.0.tar.gz.

File metadata

  • Download URL: arxiv-4.0.0.tar.gz
  • Upload date:
  • Size: 198.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for arxiv-4.0.0.tar.gz
Algorithm Hash digest
SHA256 1d30a1dba5054e0df9b1d63f8e190b58e6a59d0c2f4ccec344ce1de5bafe546d
MD5 d81098ad46cbc567ebc34f9beb1a4ec8
BLAKE2b-256 fb4888c8e9c42712760ca9e74e52f6c4a388ee9e9939e341bfd8da295a9d1b17

See more details on using hashes here.

File details

Details for the file arxiv-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: arxiv-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for arxiv-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fc7e65e74d0fba21df2c521df24119d015fcb839dd5c4feb683ee35548c932c4
MD5 12fe15b6482a100f041c08a8ed5eb761
BLAKE2b-256 af504d01d219958b19b5aaca6ae74820b181baea438cd034d5b3c04b4cf4f75e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page