Skip to main content

Python3 client for the arXiv API

Project description

arxiv-client

Python3 client for the arXiv API. Install package arxiv_client from PyPI.

This differs from the pre-existing arxiv.py project in that it further abstracts away the arXiv API so you do not need to learn to construct query strings. The overall goal is to enable users to skip reading the API docs entirely.

Basic Features

  • Simple structured queries
  • Comprehensive entity models, with documentation
    • For example, see the Category enum for arXiv's category taxonomy
  • Fully type annotated

Usage

Daily RSS Feed

import arxiv_client as arx


client = arx.Client()
articles = client.rss_by_subject(arx.Subject.COMPUTER_SCIENCE)

Search

import arxiv_client as arx


categories = [arx.Category.CS_AI, arx.Category.CS_CL, arx.Category.CS_IR]
client = arx.Client()
articles = client.search(arx.Query(keywords=["llm"], categories=categories, max_results=10))
for article in articles:
    print(article)

Structured Search Query Logic

When using the structured Query fields, multiple values within a single field are combined using OR, and multiple fields are combined using AND.

Searchable Fields

The Query object accepts the following field filters:

  • keywords: terms across all fields
  • title_keywords: terms in the article title
  • author_names: names in the author list
  • categories: arXiv subject categories
  • abstract_keywords: terms in the article abstract
  • comment_keywords: terms in the author provided comment
  • article_ids: arXiv article IDs
  • custom_params: custom query string

Example

Query(keywords=["llm"], categories=[Category.CS_AI, Category.CS_IR], max_results=5)
# Query(
#     keywords=['llm'],
#     title_keywords=[],
#     author_names=[],
#     categories=[<Category.CS_AI: 'cs.AI'>, <Category.CS_IR: 'cs.IR'>],
#     abstract_keywords=[],
#     comment_keywords=[],
#     article_ids=[],
#     custom_params=None,
#     sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>, sort_order=<SortOrder.DESC: 'descending'>),
#     start=0,
#     max_results=5
# )

Results in the following query logic:

("llm") in any field AND (cs.AI OR cs.IR) in the categories

See the Query class for more information.

Custom Search Queries

If the provided simple query logic is insufficient, the Query object takes a self-built query string through the custom_params attribute. You do not need to URL encode this value.

See arXiv Query Construction for more information on building your own queries.

Example

custom = f"cat:{Category.CS_AI.value} ANDNOT cat:{Category.CS_RO.value}"
Query(keywords=["paged attention", "attention window"], custom_params=custom)
# Query(
#     keywords=['paged attention', 'attention window'],
#     title_keywords=[],
#     author_names=[],
#     categories=[],
#     abstract_keywords=[],
#     comment_keywords=[],
#     article_ids=[],
#     custom_params='cat:cs.AI ANDNOT cat:cs.RO',
#     sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>, sort_order=<SortOrder.DESC: 'descending'>),
#     start=0,
#     max_results=10
# )

Results in the following query logic:

("paged attention" OR "attention window") in any field AND (cs.AI AND NOT cs.RO) in the categories

Equivalent query string:

(all:"paged attention" OR all:"attention window") AND (cat:cs.AI ANDNOT cat:cs.RO)

Known Issues

The arXiv search API is unreliable, especially for large queries.

The API will sometimes return incomplete results or return no entries, although the response is valid. See this GitHub issue for discussion on the topic.

If you are encountering this problem, some things that may help include:

  • Reduce the page size; 100 seems to have a relatively high success rate
  • Increase paging retry and delay parameters
  • Break up large queries into smaller queries

Retries often help with the issue, but are sometimes insufficient. If you need more reliable access to large query results, consider looking into the arXiv Bulk Data Access options.

Development

This uses hatch for project management.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_client-0.3.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

arxiv_client-0.3.1-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_client-0.3.1.tar.gz.

File metadata

  • Download URL: arxiv_client-0.3.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for arxiv_client-0.3.1.tar.gz
Algorithm Hash digest
SHA256 56c0f3f4c203b54d42664a152a14cc6414f25b9a7d584234d00dd0b7918693b1
MD5 b604abe62b11b2b131cc784bd8ac2e5f
BLAKE2b-256 4cf62e744c677a6c36247b8f7eb406e3d90a5b729be2ca776088870b613be1da

See more details on using hashes here.

File details

Details for the file arxiv_client-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: arxiv_client-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for arxiv_client-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f64bdaa909406e4f16a7c693384430566546c46c27b15c5cc72d9c3d0e9bb63
MD5 4a851b01ffbc81a4905f3360045d1fc0
BLAKE2b-256 b97e9977966023dbf464e24e9309869a0c1108a60ca3a6514ac24762039a4b2e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page