Python3 client for the arXiv API
Project description
arxiv-client
Python3 client for the arXiv API.
Install package arxiv_client
PyPI.
This differs from the pre-existing arxiv.py project in that it further abstracts away the arXiv API so you do not need to learn to construct query strings. The overall goal is to enable users to skip reading the API docs entirely.
arxiv.py
is currently:
- More stable
- Compatible with Python < 3.11
Basic Features
- Simple structured queries
- Comprehensive entity models, with documentation
- For example, see the Category enum for arXiv's subject taxonomy
- Fully type annotated
Usage
In a nutshell:
import arxiv_client as arx
import pprint
categories = [arx.Category.CS_AI, arx.Category.CS_CL, arx.Category.CS_IR]
client = arx.Client()
articles = client.search(arx.Query(keywords=["llm"], categories=categories, max_results=2))
for article in articles:
pprint.pprint(article) # Formatted pretty print is supported
Structured Query Logic
When using the structured Query
fields, multiple values within a single field are combined using OR
,
and multiple fields are combined using AND
.
Searchable Fields
The Query
object accepts the following field filters:
keywords
: terms across all fieldstitle_keywords
: terms in the article titleauthor_names
: names in the author listcategories
: arXiv subject categoriesarticle_ids
: arXiv article IDscustom_params
: custom query string
Example
Query(keywords=["llm"], categories=[Category.CS_AI, Category.CS_IR], max_results=5)
# Query(keywords=['llm'],
# title_keywords=[],
# author_names=[],
# categories=[<Category.CS_AI: 'cs.AI'>, <Category.CS_IR: 'cs.IR'>],
# article_ids=[],
# custom_params=None,
# sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
# sort_order=<SortOrder.DESC: 'descending'>),
# start=None,
# max_results=5)
Results in the following query logic:
("llm") in any field AND (cs.AI OR cs.IR) in the categories
See the Query class for more information.
Custom Queries
If the provided simple query logic is insufficient, the Query
object takes a self-built query string through the custom_params
attribute. You do not need to URL encode this value.
See arXiv Query Construction for more information on building your own queries.
Example
custom = f"cat:{Category.CS_AI.value} ANDNOT cat:{Category.CS_RO.value}"
Query(keywords=["paged attention", "attention window"], custom_params=custom)
# Query(keywords=['paged attention', 'attention window'],
# title_keywords=[],
# author_names=[],
# categories=[],
# article_ids=[],
# custom_params='cat:cs.AI ANDNOT cat:cs.RO',
# sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
# sort_order=<SortOrder.DESC: 'descending'>),
# start=None,
# max_results=10)
Results in the following query logic:
("paged attention" OR "attention window") in any field AND (cs.AI AND NOT cs.RO) in the categories
Equivalent query string:
(all:"paged attention" OR all:"attention window") AND (cat:cs.AI ANDNOT cat:cs.RO)
Development
This uses hatch for project management.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arxiv_client-0.2.0.tar.gz
.
File metadata
- Download URL: arxiv_client-0.2.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df7c6446c9793e0d97e2aa9641215080862b94a2ee22ab128aee7890d3eacc04 |
|
MD5 | 998e5b1d99f23087c7b8bcb201ca4210 |
|
BLAKE2b-256 | 68d939a4df0c7cc22237d2c18e099a3d8f7acd7b07bc726514d255a76a4a823e |
File details
Details for the file arxiv_client-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: arxiv_client-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f5210bd13aec611ff281a78c68465cfd32e8d16d5b8afa9fa8c4aa925925d62 |
|
MD5 | bee6ca65d3667b895ab7da0bb0907b06 |
|
BLAKE2b-256 | 448affd15b9b58f95fcbe536689f14fad067e8de2d0b519d4050d8b8560b9a98 |