Python wrapper for the arXiv API:

Python 2.7 Python 3.6

Python wrapper for the arXiv API.

About arXiv

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.



$ pip install arxiv

Verify the installation with

$ python test

In your Python script, include the line

import arxiv


            start = 0,
Argument Type Default
query string ""
id_list list of strings []
max_results int 10
start int 0
sort_by string "relevance"
sort_order string "descending"
prune boolean True
iterative boolean False
max_chunk_results int 1000
  • query: an arXiv query string. Format documented here.

    • Note: multi-field queries must be space-delimited. au:balents_leon AND cat:cond-mat.str-el is valid; au:balents_leon+AND+cat:cond-mat.str-el is not valid.
  • id_list: list of arXiv record IDs (typically of the format "0710.5765v1").

  • max_results: the maximum number of results returned by the query.

  • start: the offset of the first returned object from the arXiv query results.

  • sort_by: the arXiv field by which the result should be sorted.

  • sort_order: the sorting order, i.e. "ascending", "descending" or None.

  • prune: when True, received abstract objects will be simplified.

  • iterative: when True, query() will return an iterator. Otherwise, query() iterates internally and returns the full list of results.

  • max_chunk_results: the maximum number of abstracts ot be retrieved by a single internal request to the arXiv API.

Query examples:

import arxiv

# Keyword queries
arxiv.query(query="quantum", max_results=100)
# Multi-field queries
arxiv.query(query="au:balents_leon AND cat:cond-mat.str-el")
# Get single record by ID
# Get multiple records by ID
arxiv.query(id_list=["1707.08567", "1707.08567"])

# Get interator over query results
result = arxiv.query(query="quantum", max_chunk_results=10, iterative=True)
for paper in result():

For a more detailed description of the interaction between query and id_list, see this section of the arXiv documentation.

Download article PDF or source tarfile, dirpath='./', slugify=slugify, prefer_source_tarfile=False)
Argument Type Default Required?
obj dict N/A Yes
dirpath string "./" No
slugify function arxiv.slugify No
prefer_source_tarfile bool False No
  • obj is a result object, one of a list returned by query(). obj must at minimum contain values corresponding to pdf_url and title.

  • dirpath is the relative directory path to which the downloaded PDF will be saved. It defaults to the present working directory.

  • slugify is a function that processes obj into a filename. By default, prepends the object ID to the object title.

  • If prefer_source_tarfile is True, this function will download the source files for obj––rather than the rendered PDF––in .tar.gz format.

import arxiv
# Query for a paper of interest, then download
paper = arxiv.query(id_list=["1707.08567"])[0]
# You can skip the query step if you have the paper info!
paper2 = {"pdf_url": "",
          "title": "The Paper Title"}

# Download the gzipped tar file,prefer_source_tarfile=True)

# Returns the object id
def custom_slugify(obj):
    return obj.get('id').split('/')[-1]

# Download with a specified slugifier function, slugify=custom_slugify)


