Typed Python client for the arXiv API
Project description
arxiv-client
Python3 client for the arXiv API.
This differs from the pre-existing arxiv.py project in that it further abstracts away the arXiv API so you do not need to learn to build query strings on your own.
The overall goal is to enable most users to query arXiv immediately, without needing to reference the API docs.
arxiv.py
is currently more stable and is backwards compatible with older versions of Python.
It is also currently recommended for queries that return large numbers of results.
Basic Features
- Simple query building
- Comprehensive entity models, with documentation
- For example, see the Category enum for arXiv's subject taxonomy
- Fully type annotated
Under Development
- Improved page chunking for large queries
- Support for querying more fields
- Testing and validation
Usage
In a nutshell:
from arxiv_client import Client, Query, Category
import pprint
categories = [Category.CS_AI, Category.CS_CL, Category.CS_IR]
client = Client()
articles = client.search(Query(keywords=["llm"], categories=categories, max_results=2))
for article in articles:
pprint.pprint(article) # Formatted pretty print is supported
Simple Query Logic
When using the provided Query
fields, multiple values within a single field are combined using OR
,
and multiple fields are combined using AND
.
Example
Query(keywords=["llm"], categories=[Category.CS_AI, Category.CS_IR], max_results=5)
# Query(keywords=['llm'],
# title_keywords=[],
# author_names=[],
# categories=[<Category.CS_AI: 'cs.AI'>, <Category.CS_IR: 'cs.IR'>],
# article_ids=[],
# custom_params=None,
# sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
# sort_order=<SortOrder.DESC: 'descending'>),
# start=None,
# max_results=5)
Results in the following query logic:
(all:"llm") AND (cat:cs.AI OR cat:cs.IR)
See the Query class for more information.
Advanced Query Logic
If the provided simple query logic is insufficient, the Query
object takes a self-built query string through the custom_params
attribute. You do not need to URL encode this value.
See arXiv Query Construction for more information on building your own queries.
Example
custom = f"cat:{Category.CS_AI.value} ANDNOT cat:{Category.CS_RO.value}"
Query(keywords=["paged attention", "attention window"], custom_params=custom)
# Query(keywords=['paged attention', 'attention window'],
# title_keywords=[],
# author_names=[],
# categories=[],
# article_ids=[],
# custom_params='cat:cs.AI ANDNOT cat:cs.RO',
# sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
# sort_order=<SortOrder.DESC: 'descending'>),
# start=None,
# max_results=10)
Results in the following query logic:
(all:"paged attention" OR all:"attention window") AND (cat:cs.AI ANDNOT cat:cs.RO)
Development
Getting Started
This uses poetry for dependency management. See the poetry documentation for usage.
In a nutshell:
-
Install poetry
-
Install project dependencies
poetry install
-
Activate compatible virtual environment
poetry shell
Contributing
A goal was to aid in learning modern Python practices. PRs and comments for improving style or best practice are appreciated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arxiv_client-0.1.0.tar.gz
.
File metadata
- Download URL: arxiv_client-0.1.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47cab0898414c91b026ed07a1de568a4e5e55ccc741e9ae10f771047919473e9 |
|
MD5 | 7d6c3163433be00fc7931d8a266b8933 |
|
BLAKE2b-256 | 67c2c3d6a6a51624215a0156e91216f875cea5dd54c3cd9b4c005f177afe4281 |
File details
Details for the file arxiv_client-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: arxiv_client-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a19556fbc5b9cc2c065e4c16a30bba57c0012524bc70808b40f20e680af860c4 |
|
MD5 | 0ad798cd7fd67d13c29bdd217c258481 |
|
BLAKE2b-256 | 69c23f630d1bf16babe27c1b524a42715a658029d4287c64b39f91aa19789f29 |