Skip to main content

Python3 client for the arXiv API

Project description

arxiv-client

Python3 client for the arXiv API. Install package from PyPI: arxiv_client.

This differs from the pre-existing arxiv.py project in that it further abstracts away the arXiv API so you do not need to learn to construct query strings. The overall goal is to enable users to skip reading the API docs entirely.

arxiv.py is currently:

  • More stable
  • Compatible with Python < 3.11
  • Performant for large queries

Basic Features

  • Simple query building
  • Comprehensive entity models, with documentation
    • For example, see the Category enum for arXiv's subject taxonomy
  • Fully type annotated

Under Development

  • Improved page chunking for large queries
  • Support for querying more fields
  • Testing and validation

Usage

In a nutshell:

import arxiv_client as arx
import pprint

categories = [arx.Category.CS_AI, arx.Category.CS_CL, arx.Category.CS_IR]
client = arx.Client()
articles = client.search(arx.Query(keywords=["llm"], categories=categories, max_results=2))
for article in articles:
  pprint.pprint(article)  # Formatted pretty print is supported

Structured Query Logic

When using the structured Query fields, multiple values within a single field are combined using OR, and multiple fields are combined using AND.

Example

Query(keywords=["llm"], categories=[Category.CS_AI, Category.CS_IR], max_results=5)
# Query(keywords=['llm'],
#       title_keywords=[],
#       author_names=[],
#       categories=[<Category.CS_AI: 'cs.AI'>, <Category.CS_IR: 'cs.IR'>],
#       article_ids=[],
#       custom_params=None,
#       sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
#                                    sort_order=<SortOrder.DESC: 'descending'>),
#       start=None,
#       max_results=5)

Results in the following query logic:

(all:"llm") AND (cat:cs.AI OR cat:cs.IR)

See the Query class for more information.

Custom Queries

If the provided simple query logic is insufficient, the Query object takes a self-built query string through the custom_params attribute. You do not need to URL encode this value.

See arXiv Query Construction for more information on building your own queries.

Example

custom = f"cat:{Category.CS_AI.value} ANDNOT cat:{Category.CS_RO.value}"
Query(keywords=["paged attention", "attention window"], custom_params=custom)
# Query(keywords=['paged attention', 'attention window'],
#       title_keywords=[],
#       author_names=[],
#       categories=[],
#       article_ids=[],
#       custom_params='cat:cs.AI ANDNOT cat:cs.RO',
#       sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
#                                    sort_order=<SortOrder.DESC: 'descending'>),
#       start=None,
#       max_results=10)

Results in the following query logic:

(all:"paged attention" OR all:"attention window") AND (cat:cs.AI ANDNOT cat:cs.RO)

Development

This uses hatch for project management.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_client-0.1.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

arxiv_client-0.1.1-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_client-0.1.1.tar.gz.

File metadata

  • Download URL: arxiv_client-0.1.1.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for arxiv_client-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2cea43826fa38b2c42014df7bae6132d08545b9d94398a6abab20b5f9b1c29fe
MD5 d8af3aeb144d8717fc1747eb7bef81ea
BLAKE2b-256 29b349cde50c793f9220e80e8b02d8cbf49873e2a666a1648cd6513eaf6314db

See more details on using hashes here.

File details

Details for the file arxiv_client-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: arxiv_client-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for arxiv_client-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed1653fbe0b824c2f80c86caba14584f3149ec1a92569a43a6e52af0a424d2ac
MD5 9fddbc5d5bd1c3c5e31ad4985ad5a678
BLAKE2b-256 e217aea41655ae0b2ad4482c533422cd3e84eea70b4de0dce16d38d31272d4bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page