Python wrapper for the arXiv API: http://arxiv.org/help/api/
Project description
arxiv.py

Python wrapper for the arXiv API.
About arXiv
arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.
Usage
Installation
$ pip install arxiv
Verify the installation with
$ python setup.py test
In your Python script, include the line
import arxiv
Query
arxiv.query(
query="",
id_list=[],
max_results=None,
start = 0,
sort_by="relevance",
sort_order="descending",
prune=True,
iterative=False,
max_chunk_results=1000
)
Argument | Type | Default |
---|---|---|
query |
string | "" |
id_list |
list of strings | [] |
max_results |
int | 10 |
start |
int | 0 |
sort_by |
string | "relevance" |
sort_order |
string | "descending" |
prune |
boolean | True |
iterative |
boolean | False |
max_chunk_results |
int | 1000 |
-
query
: an arXiv query string. Format documented here.- Note: multi-field queries must be space-delimited.
au:balents_leon AND cat:cond-mat.str-el
is valid;au:balents_leon+AND+cat:cond-mat.str-el
is not valid.
- Note: multi-field queries must be space-delimited.
-
id_list
: list of arXiv record IDs (typically of the format"0710.5765v1"
). -
max_results
: the maximum number of results returned by the query. Note: if this is unset amditerative=False
, the call toquery
can take a long time to resolve. -
start
: the offset of the first returned object from the arXiv query results. -
sort_by
: the arXiv field by which the result should be sorted. -
sort_order
: the sorting order, i.e. "ascending", "descending" or None. -
prune
: whenTrue
, received abstract objects will be simplified. -
iterative
: whenTrue
,query()
will return an iterator. Otherwise,query()
iterates internally and returns the full list of results. -
max_chunk_results
: the maximum number of abstracts ot be retrieved by a single internal request to the arXiv API.
Query examples:
import arxiv
# Keyword queries
arxiv.query(query="quantum", max_results=100)
# Multi-field queries
arxiv.query(query="au:balents_leon AND cat:cond-mat.str-el")
# Get single record by ID
arxiv.query(id_list=["1707.08567"])
# Get multiple records by ID
arxiv.query(id_list=["1707.08567", "1707.08567"])
# Get an interator over query results
result = arxiv.query(
query="quantum",
max_chunk_results=10,
max_results=100,
iterative=True
)
for paper in result():
print(paper)
For a more detailed description of the interaction between the query
and id_list
arguments, see this section of the arXiv documentation.
Download article PDF or source tarfile
arxiv.arxiv.download(obj, dirpath='./', slugify=slugify, prefer_source_tarfile=False)
Argument | Type | Default | Required? |
---|---|---|---|
obj |
dict | N/A | Yes |
dirpath |
string | "./" |
No |
slugify |
function | arxiv.slugify |
No |
prefer_source_tarfile |
bool | False |
No |
-
obj
is a result object, one of a list returned by query().obj
must at minimum contain values corresponding topdf_url
andtitle
. -
dirpath
is the relative directory path to which the downloaded PDF will be saved. It defaults to the present working directory. -
slugify
is a function that processesobj
into a filename. By default,arxiv.download(obj)
prepends the object ID to the object title. -
If
prefer_source_tarfile
isTrue
, this function will download the source files forobj
––rather than the rendered PDF––in .tar.gz format.
import arxiv
# Query for a paper of interest, then download it.
paper = arxiv.query(id_list=["1707.08567"])[0]
arxiv.download(paper)
# You can skip the query step if you have the paper info.
paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1",
"title": "The Paper Title"}
arxiv.download(paper2)
# Use prefer_source_tarfile to download the gzipped tar file.
arxiv.download(paper, prefer_source_tarfile=True)
# Override the default filename format by defining a slugify function.
arxiv.download(paper, slugify=lambda paper: paper.get('id').split('/')[-1])
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arxiv-0.5.4.tar.gz
.
File metadata
- Download URL: arxiv-0.5.4.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
af3eca0ab9f6ecfd567cc1bd25d382e03c1e8518f05719dd0789e4d3745ba7d4
|
|
MD5 |
fa284b2e739a3ec3e7ce0e36e72501c4
|
|
BLAKE2b-256 |
b2a534194791d48193fefdc8ff41a0ae394ae37dc8ca57dd360a38f9b300d054
|
File details
Details for the file arxiv-0.5.4-py3-none-any.whl
.
File metadata
- Download URL: arxiv-0.5.4-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
49e623bd4c88644ff08882bbb213f5858452034435deef8b0ddc07caa1d22386
|
|
MD5 |
db08f69ba0ca5d130de9343041fdf984
|
|
BLAKE2b-256 |
83e297022e799f0361eeec5687d2fedf9eb0010c98d480f2ae861eac84b85355
|