Skip to main content

Typed Python client for the arXiv API

Project description

arxiv-client

Python3 client for the arXiv API.

This differs from the pre-existing arxiv.py project in that it further abstracts away the arXiv API so you do not need to learn to build query strings on your own.

The overall goal is to enable most users to query arXiv immediately, without needing to reference the API docs.

arxiv.py is currently more stable and is backwards compatible with older versions of Python. It is also currently recommended for queries that return large numbers of results.

Basic Features

  • Simple query building
  • Comprehensive entity models, with documentation
    • For example, see the Category enum for arXiv's subject taxonomy
  • Fully type annotated

Under Development

  • Improved page chunking for large queries
  • Support for querying more fields
  • Testing and validation

Usage

In a nutshell:

from arxiv_client import Client, Query, Category
import pprint


categories = [Category.CS_AI, Category.CS_CL, Category.CS_IR]
client = Client()
articles = client.search(Query(keywords=["llm"], categories=categories, max_results=2))
for article in articles:
    pprint.pprint(article) # Formatted pretty print is supported

Simple Query Logic

When using the provided Query fields, multiple values within a single field are combined using OR, and multiple fields are combined using AND.

Example

Query(keywords=["llm"], categories=[Category.CS_AI, Category.CS_IR], max_results=5)
# Query(keywords=['llm'],
#       title_keywords=[],
#       author_names=[],
#       categories=[<Category.CS_AI: 'cs.AI'>, <Category.CS_IR: 'cs.IR'>],
#       article_ids=[],
#       custom_params=None,
#       sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
#                                    sort_order=<SortOrder.DESC: 'descending'>),
#       start=None,
#       max_results=5)

Results in the following query logic:

(all:"llm") AND (cat:cs.AI OR cat:cs.IR)

See the Query class for more information.

Advanced Query Logic

If the provided simple query logic is insufficient, the Query object takes a self-built query string through the custom_params attribute. You do not need to URL encode this value.

See arXiv Query Construction for more information on building your own queries.

Example

custom = f"cat:{Category.CS_AI.value} ANDNOT cat:{Category.CS_RO.value}"
Query(keywords=["paged attention", "attention window"], custom_params=custom)
# Query(keywords=['paged attention', 'attention window'],
#       title_keywords=[],
#       author_names=[],
#       categories=[],
#       article_ids=[],
#       custom_params='cat:cs.AI ANDNOT cat:cs.RO',
#       sort_criterion=SortCriterion(sort_by=<SortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'>,
#                                    sort_order=<SortOrder.DESC: 'descending'>),
#       start=None,
#       max_results=10)

Results in the following query logic:

(all:"paged attention" OR all:"attention window") AND (cat:cs.AI ANDNOT cat:cs.RO)

Development

Getting Started

This uses poetry for dependency management. See the poetry documentation for usage.

In a nutshell:

  1. Install poetry

  2. Install project dependencies

    poetry install
    
  3. Activate compatible virtual environment

    poetry shell
    

Contributing

A goal was to aid in learning modern Python practices. PRs and comments for improving style or best practice are appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_client-0.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

arxiv_client-0.1.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_client-0.1.0.tar.gz.

File metadata

  • Download URL: arxiv_client-0.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for arxiv_client-0.1.0.tar.gz
Algorithm Hash digest
SHA256 47cab0898414c91b026ed07a1de568a4e5e55ccc741e9ae10f771047919473e9
MD5 7d6c3163433be00fc7931d8a266b8933
BLAKE2b-256 67c2c3d6a6a51624215a0156e91216f875cea5dd54c3cd9b4c005f177afe4281

See more details on using hashes here.

File details

Details for the file arxiv_client-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: arxiv_client-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for arxiv_client-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a19556fbc5b9cc2c065e4c16a30bba57c0012524bc70808b40f20e680af860c4
MD5 0ad798cd7fd67d13c29bdd217c258481
BLAKE2b-256 69c23f630d1bf16babe27c1b524a42715a658029d4287c64b39f91aa19789f29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page