Skip to main content

Python wrapper for the arXiv API: https://arxiv.org/help/api/

Project description

arxiv.py

PyPI PyPI - Python Version GitHub Workflow Status (branch) Full package documentation

Python wrapper for the arXiv API.

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Installation

$ pip install arxiv

In your Python script, include the line

import arxiv

Examples

Fetching results

import arxiv

# Construct the default API client.
client = arxiv.Client()

# Search for the 10 most recent articles matching the keyword "quantum."
search = arxiv.Search(
  query = "quantum",
  max_results = 10,
  sort_by = arxiv.SortCriterion.SubmittedDate
)

results = client.results(search)

# `results` is a generator; you can iterate over its elements one by one...
for r in client.results(search):
  print(r.title)
# ...or exhaust it into a list. Careful: this is slow for large results sets.
all_results = list(results)
print([r.title for r in all_results])

# For advanced query syntax documentation, see the arXiv API User Manual:
# https://arxiv.org/help/api/user-manual#query_details
search = arxiv.Search(query = "au:del_maestro AND ti:checkerboard")
first_result = next(client.results(search))
print(first_result)

# Search for the paper with ID "1605.08386v1"
search_by_id = arxiv.Search(id_list=["1605.08386v1"])
# Reuse client to fetch the paper, then print its title.
first_result = next(client.results(search))
print(first_result.title)

Downloading papers

To download a PDF of the paper with ID "1605.08386v1," run a Search and then use Result.download_pdf():

import arxiv

paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))
# Download the PDF to the PWD with a default filename.
paper.download_pdf()
# Download the PDF to the PWD with a custom filename.
paper.download_pdf(filename="downloaded-paper.pdf")
# Download the PDF to a specified directory with a custom filename.
paper.download_pdf(dirpath="./mydir", filename="downloaded-paper.pdf")

The same interface is available for downloading .tar.gz files of the paper source:

import arxiv

paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))
# Download the archive to the PWD with a default filename.
paper.download_source()
# Download the archive to the PWD with a custom filename.
paper.download_source(filename="downloaded-paper.tar.gz")
# Download the archive to a specified directory with a custom filename.
paper.download_source(dirpath="./mydir", filename="downloaded-paper.tar.gz")

Fetching results with a custom client

import arxiv

big_slow_client = arxiv.Client(
  page_size = 1000,
  delay_seconds = 10.0,
  num_retries = 5
)

# Prints 1000 titles before needing to make another request.
for result in big_slow_client.results(arxiv.Search(query="quantum")):
  print(result.title)

Logging

To inspect this package's network behavior and API logic, configure a DEBUG-level logger.

>>> import logging, arxiv
>>> logging.basicConfig(level=logging.DEBUG)
>>> client = arxiv.Client()
>>> paper = next(client.results(arxiv.Search(id_list=["1605.08386v1"])))
INFO:arxiv.arxiv:Requesting 100 results at offset 0
INFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443
DEBUG:urllib3.connectionpool:https://export.arxiv.org:443 "GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1" 200 979

Types

Client

A Client specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.

Clients configurations specify pagination and retry logic. Reusing a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.

Search

A Search specifies a search of arXiv's database. Use Client.results to get a generator yielding Results.

Result

The Result objects yielded by Client.results include metadata about each paper and helper methods for downloading their content.

The meaning of the underlying raw data is documented in the arXiv API User Manual: Details of Atom Results Returned.

Result also exposes helper methods for downloading papers: Result.download_pdf and Result.download_source.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv-2.1.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv-2.1.1-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file arxiv-2.1.1.tar.gz.

File metadata

  • Download URL: arxiv-2.1.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.4

File hashes

Hashes for arxiv-2.1.1.tar.gz
Algorithm Hash digest
SHA256 f98c4a922b3c7881b6b55ca6eea7e4fa8747c9d7cf6cad539254dc9f48733702
MD5 b626e76f540ff6a8689c0de7813f5317
BLAKE2b-256 1095dcceb3a73da9ef5a23d3c40a8cf5e0330393711a4ff8fdbadefeb2ccfd67

See more details on using hashes here.

File details

Details for the file arxiv-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: arxiv-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.4

File hashes

Hashes for arxiv-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5399810b8a2b2fe61684ca76d7413c3354ffb1cdd80de9263d631b3a851f8267
MD5 5f2522ee2eed057e8e0757f5872c5cef
BLAKE2b-256 1cb439651e7d31c5c9b7a9231e111864b51cfaf26e70e1e926c88ee36264abc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page