Skip to main content

Python3 client for the arXiv API

Project description

arxiv-client

Python3 client for the arXiv API. Install package arxiv_client PyPI.

This differs from the pre-existing arxiv.py project in that it further abstracts away the arXiv API so you do not need to learn to construct query strings. The overall goal is to enable users to skip reading the API docs entirely.

arxiv.py is currently:

  • More stable
  • Compatible with Python < 3.11

Basic Features

  • Simple structured queries
  • Comprehensive entity models, with documentation
    • For example, see the Category enum for arXiv's subject taxonomy
  • Fully type annotated

Usage

In a nutshell:

import arxiv_client as arx
import pprint

categories = [arx.Category.CS_AI, arx.Category.CS_CL, arx.Category.CS_IR]
client = arx.Client()
articles = client.search(arx.Query(keywords=["llm"], categories=categories, max_results=2))
for article in articles:
  pprint.pprint(article)  # Formatted pretty print is supported

Structured Query Logic

When using the structured Query fields, multiple values within a single field are combined using OR, and multiple fields are combined using AND.

Searchable Fields

The Query object accepts the following field filters:

  • keywords: terms across all fields
  • title_keywords: terms in the article title
  • author_names: names in the author list
  • categories: arXiv subject categories
  • article_ids: arXiv article IDs
  • custom_params: custom query string

Example

Query(keywords=["llm"], categories=[Category.CS_AI, Category.CS_IR], max_results=5)
# Query(keywords=['llm'],
#       title_keywords=[],
#       author_names=[],
#       categories=[<Category.CS_AI: 'cs.AI'>, <Category.CS_IR: 'cs.IR'>],
#       article_ids=[],
#       custom_params=None,
#       sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
#                                    sort_order=<SortOrder.DESC: 'descending'>),
#       start=None,
#       max_results=5)

Results in the following query logic:

("llm") in any field AND (cs.AI OR cs.IR) in the categories

See the Query class for more information.

Custom Queries

If the provided simple query logic is insufficient, the Query object takes a self-built query string through the custom_params attribute. You do not need to URL encode this value.

See arXiv Query Construction for more information on building your own queries.

Example

custom = f"cat:{Category.CS_AI.value} ANDNOT cat:{Category.CS_RO.value}"
Query(keywords=["paged attention", "attention window"], custom_params=custom)
# Query(keywords=['paged attention', 'attention window'],
#       title_keywords=[],
#       author_names=[],
#       categories=[],
#       article_ids=[],
#       custom_params='cat:cs.AI ANDNOT cat:cs.RO',
#       sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
#                                    sort_order=<SortOrder.DESC: 'descending'>),
#       start=None,
#       max_results=10)

Results in the following query logic:

("paged attention" OR "attention window") in any field AND (cs.AI AND NOT cs.RO) in the categories

Equivalent query string:

(all:"paged attention" OR all:"attention window") AND (cat:cs.AI ANDNOT cat:cs.RO)

Development

This uses hatch for project management.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_client-0.2.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

arxiv_client-0.2.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_client-0.2.0.tar.gz.

File metadata

  • Download URL: arxiv_client-0.2.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for arxiv_client-0.2.0.tar.gz
Algorithm Hash digest
SHA256 df7c6446c9793e0d97e2aa9641215080862b94a2ee22ab128aee7890d3eacc04
MD5 998e5b1d99f23087c7b8bcb201ca4210
BLAKE2b-256 68d939a4df0c7cc22237d2c18e099a3d8f7acd7b07bc726514d255a76a4a823e

See more details on using hashes here.

File details

Details for the file arxiv_client-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: arxiv_client-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for arxiv_client-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f5210bd13aec611ff281a78c68465cfd32e8d16d5b8afa9fa8c4aa925925d62
MD5 bee6ca65d3667b895ab7da0bb0907b06
BLAKE2b-256 448affd15b9b58f95fcbe536689f14fad067e8de2d0b519d4050d8b8560b9a98

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page