Skip to main content

Python wrapper for the arXiv API: http://arxiv.org/help/api/

Project description

arxiv.py Python 2.7 Python 3.6

Python wrapper for the arXiv API.

About arXiv

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Installation

$ pip install arxiv

Verify the installation with

$ python setup.py test

In your Python script, include the line

import arxiv

Query

arxiv.query(query="",
            id_list=[],
            max_results=None,
            start = 0,
            sort_by="relevance",
            sort_order="descending",
            prune=True,
            iterative=False,
            max_chunk_results=1000)
Argument Type Default
query string ""
id_list list of strings []
max_results int 10
start int 0
sort_by string "relevance"
sort_order string "descending"
prune boolean True
iterative boolean False
max_chunk_results int 1000
  • query: an arXiv query string. Format documented here.

    • Note: multi-field queries must be space-delimited. au:balents_leon AND cat:cond-mat.str-el is valid; au:balents_leon+AND+cat:cond-mat.str-el is not valid.
  • id_list: list of arXiv record IDs (typically of the format "0710.5765v1").

  • max_results: the maximum number of results returned by the query.

  • start: the offset of the first returned object from the arXiv query results.

  • sort_by: the arXiv field by which the result should be sorted.

  • sort_order: the sorting order, i.e. "ascending", "descending" or None.

  • prune: when True, received abstract objects will be simplified.

  • iterative: when True, query() will return an iterator. Otherwise, query() iterates internally and returns the full list of results.

  • max_chunk_results: the maximum number of abstracts ot be retrieved by a single internal request to the arXiv API.

Query examples:

import arxiv

# Keyword queries
arxiv.query(query="quantum", max_results=100)
# Multi-field queries
arxiv.query(query="au:balents_leon AND cat:cond-mat.str-el")
# Get single record by ID
arxiv.query(id_list=["1707.08567"])
# Get multiple records by ID
arxiv.query(id_list=["1707.08567", "1707.08567"])

# Get interator over query results
result = arxiv.query(query="quantum", max_chunk_results=10, iterative=True)
for paper in result():
   print(paper)

For a more detailed description of the interaction between query and id_list, see this section of the arXiv documentation.

Download article PDF

arxiv.download(obj, dirpath="./", slugify=arxiv.slugify)
Argument Type Default Required?
obj dict N/A Yes
dirpath string "./" No
slugify function arxiv.slugify No
  • obj is a result object, one of a list returned by query(). obj must at minimum contain values corresponding to pdf_url and title.

  • dirpath is the relative directory path to which the downloaded PDF will be saved. It defaults to the present working directory.

  • slugify is a function that processes obj into a filename. By default, arxiv.download(obj) prepends the object ID to the object title.

import arxiv
# Query for a paper of interest, then download
paper = arxiv.query(id_list=["1707.08567"])[0]
arxiv.download(paper)
# You can skip the query step if you have the paper info!
paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1",
          "title": "The Paper Title"}
arxiv.download(paper2)

# Returns the object id
def custom_slugify(obj):
    return obj.get('id').split('/')[-1]

# Download with a specified slugifier function
arxiv.download(paper, slugify=custom_slugify)

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv-0.5.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv-0.5.1-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file arxiv-0.5.1.tar.gz.

File metadata

  • Download URL: arxiv-0.5.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for arxiv-0.5.1.tar.gz
Algorithm Hash digest
SHA256 5cfb924b60e3ea0ebb3b5d0c32c849df46a2b000036d0bf578c71fba54512233
MD5 814230e022b9e7d1073d1a20eadc105a
BLAKE2b-256 4998fac025dbf34a487936160d069b8f248e06f7c469b07984d7908ac4968008

See more details on using hashes here.

File details

Details for the file arxiv-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: arxiv-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for arxiv-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da8d9b402fde28207975c6e3c2e177ff1c8b2077bb7178a1eaf1c5c4bf41038a
MD5 55d41d73ae3f6f72ddce143c59c1f5c8
BLAKE2b-256 a07f174c39aa56d4ef11232d444da1eb7446b60bc1e537b0e1017f4367a010df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page