Skip to main content

Python wrapper for the arXiv API: http://arxiv.org/help/api/

Project description

arxiv.py Python 3.6 PyPI GitHub Workflow Status (branch)

Python wrapper for the arXiv API.

Quick links

About arXiv

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Installation

$ pip install arxiv

In your Python script, include the line

import arxiv

Search

A Search specifies a search of arXiv's database.

arxiv.Search(
  query: str = "",
  id_list: List[str] = [],
  max_results: float = float('inf'),
  sort_by: SortCriterion = SortCriterion.Relevanvce,
  sort_order: SortOrder = SortOrder.Descending
)
  • query: an arXiv query string. Advanced query formats are documented in the arXiv API User Manual.
  • id_list: list of arXiv record IDs (typically of the format "0710.5765v1"). See the arXiv API User's Manual for documentation of the interaction between query and id_list.
  • max_results: The maximum number of results to be returned in an execution of this search. To fetch every result available, set max_results=float('inf') (default); to fetch up to 10 results, set max_results=10. The API's limit is 300,000 results.
  • sort_by: The sort criterion for results: relevance, lastUpdatedDate, or submittedDate.
  • sort_order: The sort order for results: 'descending' or 'ascending'.

To fetch arXiv records matching a Search, use search.results() or (Client).results(search) to get a generator yielding Results.

Example: fetching results

Print the titles fo the 10 most recent articles related to the keyword "quantum:"

import arxiv

search = arxiv.Search(
  query = "quantum",
  max_results = 10,
  sort_by = arxiv.SortCriterion.SubmittedDate
)

for result in search.results():
  print(result.title)

Fetch and print the title of the paper with ID "1605.08386v1:"

import arxiv

search = arxiv.Search(id_list=["1605.08386v1"])
paper = next(search.results())
print(paper.title)

Result

The Result objects yielded by (Search).results() include metadata about each paper and some helper functions for downloading their content.

The meaning of the underlying raw data is documented in the arXiv API User Manual: Details of Atom Results Returned.

  • result.entry_id: A url http://arxiv.org/abs/{id}.
  • result.updated: When the result was last updated.
  • result.published: When the result was originally published.
  • result.title: The title of the result.
  • result.authors: The result's authors, as arxiv.Authors.
  • result.summary: The result abstract.
  • result.comment: The authors' comment if present.
  • result.journal_ref: A journal reference if present.
  • result.doi: A URL for the resolved DOI to an external resource if present.
  • result.primary_category: The result's primary arXiv category. See arXiv: Category Taxonomy.
  • result.categories: All of the result's categories. See arXiv: Category Taxonomy.
  • result.links: Up to three URLs associated with this result, as arxiv.Links.
  • result.pdf_url: A URL for the result's PDF if present. Note: this URL also appears among result.links.

They also expose helper methods for downloading papers: (Result).download_pdf() and (Result).download_source().

Example: downloading papers

To download a PDF of the paper with ID "1605.08386v1," run a Search and then use (Result).download_pdf():

import arxiv

paper = next(arxiv.Search(id_list=["1605.08386v1"]).results())
# Download the PDF to the PWD with a default filename.
paper.download_pdf()
# Download the PDF to the PWD with a custom filename.
paper.download_pdf(filename="downloaded-paper.pdf")
# Download the PDF to a specified directory with a custom filename.
paper.download_pdf(dirpath="./mydir", filename="downloaded-paper.pdf")

The same interface is available for downloading .tar.gz files of the paper source:

import arxiv

paper = next(arxiv.Search(id_list=["1605.08386v1"]).results())
# Download the archive to the PWD with a default filename.
paper.download_source()
# Download the archive to the PWD with a custom filename.
paper.download_source(filename="downloaded-paper.tar.gz")
# Download the archive to a specified directory with a custom filename.
paper.download_source(dirpath="./mydir", filename="downloaded-paper.tar.gz")

Client

A Client specifies a strategy for fetching results from arXiv's API; it obscures pagination and retry logic.

For most use cases the default client should suffice. You can construct it explicitly with arxiv.Client(), or use it via the (Search).results() method.

arxiv.Client(
  page_size: int = 100,
  delay_seconds: int = 3,
  num_retries: int = 3
)
  • page_size: the number of papers to fetch from arXiv per page of results. Smaller pages can be retrieved faster, but may require more round-trips. The API's limit is 2000 results.
  • delay_seconds: the number of seconds to wait between requests for pages. arXiv's Terms of Use ask that you "make no more than one request every three seconds."
  • num_retries: The number of times the client will retry a request that fails, either with a non-200 HTTP status code or with an unexpected number of results given the search parameters.

Example: fetching results with a custom client

(Search).results() uses the default client settings. If you want to use a client you've defined instead of the defaults, use (Client).results(...):

import arxiv

big_slow_client = arxiv.Client(
  page_size = 1000,
  delay_seconds = 10,
  num_retries = 5
)

# Prints 1000 titles before needing to make another request.
for result in big_slow_client.results(arxiv.Search(query="quantum")):
  print(result.title)

Example: logging

To inspect this package's network behavior and API logic, configure an INFO-level logger.

>>> import logging, arxiv
>>> logging.basicConfig(level=logging.INFO)
>>> paper = next(arxiv.Search(id_list=["1605.08386v1"]).results())
INFO:arxiv.arxiv:Requesting 100 results at offset 0
INFO:arxiv.arxiv:Requesting page of results
INFO:arxiv.arxiv:Got first page; 1 of inf results available

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv-1.4.2.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv-1.4.2-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file arxiv-1.4.2.tar.gz.

File metadata

  • Download URL: arxiv-1.4.2.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.17

File hashes

Hashes for arxiv-1.4.2.tar.gz
Algorithm Hash digest
SHA256 87d5045c342ad8e24e2007bdb4d78f43b3d58357274e20d8a1db70c429e8b0ab
MD5 e9f9759f3688af719edd249be5c0ef75
BLAKE2b-256 96a3587e756cdbbbdf95d5869961516bafde96868b6d6b79e939a63be1041606

See more details on using hashes here.

File details

Details for the file arxiv-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: arxiv-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.17

File hashes

Hashes for arxiv-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6b29d9ff152a1d7bd20781a397ea03ed864b213470337eb170b059e431f1c3fc
MD5 8e4585f3dc630cbffcd8dbefb0e688fa
BLAKE2b-256 26464df59ed734dab7428df9e800f9a6a94d5b93b29e872633501b01d53fe6e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page