Skip to main content

A Python library providing a fluent, intuitive, and chainable interface for easily building complex arXiv API queries, simplifying data retrieval and integration into your workflow.

Project description

Arxiv Query Fluent

A Python library providing a fluent, intuitive, and chainable interface for easily building complex arXiv API queries, simplifying data retrieval and integration into your workflow.
The project provides a convenient Python interface to build queries, display results, and download PDFs.
This README provides examples to get you started and explains the query syntax, sorting, grouping, and pagination features.

Usage

Installation

$ pip install arxiv_query_fluent

In your Python script, include the line

from arxiv_query_fluent import (
    Query,
    Field,
    Category,
    Opt,
    DateRange,
    SortCriterion,
    SortOrder
)

Basic Query Example: Search Papers by Author

The following example demonstrates how to search all papers written by the author "Stas Tiomkin".

query = Query().add(Field.author, "Stas Tiomkin")

Then, run the query and inspect the results:

result = query.get()
result.desc()

The output might look like:

Page Entries: 1-26 | Total Entries : 26 | Pages: 1 / 1

This output indicates:

  • Page Entries: 1-26: The current page contains results 1 to 26.
  • Total Entries: 26: A total of 26 papers were found.
  • Pages: 1 / 1: All results are on one page.

To view detailed information for the first 3 papers (with abstracts truncated to 100 characters):

result.show(top_n=3, abstract_shown=100)

Output:

Entries: 1-3/(26) | Pages: 1 / 1
───────────────────────────────────────────
Entry: #1
Title: Acoustic Wave Manipulation Through Sparse Robotic Actuation | arXiv Identifier: 2502.08784v2
Authors: Tristan Shah, Noam Smilovich, Feruza Amirkulova, Samer Gerges, Stas Tiomkin
Published Date: 2025-02-12 20:54:46+00:00
PDF Link: http://arxiv.org/pdf/2502.08784v2
Abstract:
Recent advancements in robotics, control, and machine learning have
facilitated progress in the chal...
───────────────────────────────────────────
Entry: #2
Title: Average-Reward Reinforcement Learning with Entropy Regularization | arXiv Identifier: 2501.09080v1
Authors: Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni
Published Date: 2025-01-15 19:00:46+00:00
PDF Link: http://arxiv.org/pdf/2501.09080v1
Abstract:
The average-reward formulation of reinforcement learning (RL) has drawn
increased interest in recent...
───────────────────────────────────────────
Entry: #3
Title: EVAL: EigenVector-based Average-reward Learning | arXiv Identifier: 2501.09770v1
Authors: Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni
Published Date: 2025-01-15 19:00:45+00:00
PDF Link: http://arxiv.org/pdf/2501.09770v1
Abstract:
In reinforcement learning, two objective functions have been developed
extensively in the literature...
───────────────────────────────────────────

To download the first paper, execute:

result.download_pdf(identifier="2501.09080v1", dirpath="./")

Parameter Explanation:

  • identifier: The arXiv identifier of the paper (e.g., "2501.09080v1").
  • dirpath: The directory path where the PDF file will be saved.
  • filename (Optional[str]): If not provided, the default filename will be <identifier>.pdf (e.g., "2501.09080v1.pdf").

Output:

'./2501.09080v1.pdf'

Advanced Query: Combining Author and Subject Filters

To search for papers by "Stas Tiomkin" that also belong to the AI category:

result = (
    Query(max_entries_per_pager=10)
    .add(Field.author, "Stas Tiomkin")
    .add(Field.category, Category.CS_AI, Opt.And)
    .get()
)
result.list()

Explanation:

  • Field.category: This category field filters query results by the subject category of a paper. It corresponds to the 'cat' prefix in the arXiv API. For a complete list of fields, please refer to Appendix A.
  • Category.CS_AI: CS_AI Represents the Computer Science - Artificial Intelligence category. It filters the query to return only papers related to AI. For values that can be used with Field.category, please refer to the arXiv category taxonomy. The Category Enum in this project is built based on the current arXiv Category Taxonomy.

Output:

[2025-02-12] [Acoustic Wave Manipulation Through Sparse Robotic Actuation] [Tristan Shah,Noam Smilovich,Feruza Amirkulova,Samer Gerges,Stas Tiomkin]
[2025-01-15] [Average-Reward Reinforcement Learning with Entropy Regularization] [Jacob Adamczyk,Volodymyr Makarenko,Stas Tiomkin,Rahul V. Kulkarni]
[2025-01-15] [EVAL: EigenVector-based Average-reward Learning] [Jacob Adamczyk,Volodymyr Makarenko,Stas Tiomkin,Rahul V. Kulkarni]
[2025-01-02] [Bootstrapped Reward Shaping] [Jacob Adamczyk,Volodymyr Makarenko,Stas Tiomkin,Rahul V. Kulkarni]
[2024-11-20] [SuPLE: Robot Learning with Lyapunov Rewards] [Phu Nguyen,Daniel Polani,Stas Tiomkin]
[2024-06-26] [Boosting Soft Q-Learning by Bounding] [Jacob Adamczyk,Volodymyr Makarenko,Stas Tiomkin,Rahul V. Kulkarni]
[2024-06-20] [Learning telic-controllable state representations] [Nadav Amir,Stas Tiomkin,Angela Langdon]
[2023-11-27] [Taming Waves: A Physically-Interpretable Machine Learning Framework for Realizable Control of Wave Dynamics] [Tristan Shah,Feruza Amirkulova,Stas Tiomkin]
[2023-11-11] [Controllability-Constrained Deep Network Models for Enhanced Control of Dynamical Systems] [Suruchi Sharma,Volodymyr Makarenko,Gautam Kumar,Stas Tiomkin]
[2023-11-06] [Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems] [Derek Lilienthal,Paul Mello,Magdalini Eirinaki,Stas Tiomkin]

This example uses the list() method to display each paper's publication date, title, and authors.

The query syntax follows the structure:

Query().add(Field, Value).add(Field, Value, BooleanOperator)...get()

Where:

  • Field is an Enum object.
  • Value is a string or a DateRange object.
  • Boolean Operator can be And, Or, or AndNot.

For more details on the supported fields by the arXiv API, please refer to arXiv API documentation.


Sorting and Ordering Query Results

You can sort your query results by different criteria. For example, to sort by submitted date in ascending order and filter for AI papers with the keyword "DeepSeek":

deepseek = (
    Query(sortBy=SortCriterion.SubmittedDate, sortOrder=SortOrder.Ascending)
    .add(Field.category, Category.CS_AI)
    .add(Field.all, "DeepSeek", Opt.And)
    .add(Field.submitted_date, DateRange("20240701", "20250228"), Opt.And)
    .get(1)
)
deepseek.show(3, abstract_shown=0)

Output:

Entries: 1-3/(50) | Pages: 1 / 3
───────────────────────────────────────────
Entry: #1
Title: Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | arXiv Identifier: 2407.01906v2
Authors: Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu
Published Date: 2024-07-02 03:11:13+00:00
PDF Link: http://arxiv.org/pdf/2407.01906v2
───────────────────────────────────────────
Entry: #2
Title: Let the Code LLM Edit Itself When You Edit the Code | arXiv Identifier: 2407.03157v2
Authors: Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He
Published Date: 2024-07-03 14:34:03+00:00
PDF Link: http://arxiv.org/pdf/2407.03157v2
───────────────────────────────────────────
Entry: #3
Title: DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | arXiv Identifier: 2407.04078v3
Authors: Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu
Published Date: 2024-07-04 17:39:16+00:00
PDF Link: http://arxiv.org/pdf/2407.04078v3
───────────────────────────────────────────

Alternatively, you can change the sorting criteria to sort by relevance in descending order:

(
    Query(sortBy=SortCriterion.Relevance, sortOrder=SortOrder.Descending)
    .add(Field.category, Category.CS_AI)
    .add(Field.all, "DeepSeek", Opt.And)
    .add(Field.submitted_date, DateRange("20240701", "20250228"), Opt.And)
    .get(1)
    .show(3, 0)
)

Output:

Entries: 1-3/(50) | Pages: 1 / 3
───────────────────────────────────────────
Entry: #1
Title: DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities | arXiv Identifier: 2502.17807v1
Authors: Tianyi Zhuang, Chuqiao Kuang, Xiaoguang Li, Yihua Teng, Jihao Wu, Yasheng Wang, Lifeng Shang
Published Date: 2025-02-25 03:29:53+00:00
PDF Link: http://arxiv.org/pdf/2502.17807v1
───────────────────────────────────────────
Entry: #2
Title: A Comparison of DeepSeek and Other LLMs | arXiv Identifier: 2502.03688v2
Authors: Tianchen Gao, Jiashun Jin, Zheng Tracy Ke, Gabriel Moryoussef
Published Date: 2025-02-06 00:38:25+00:00
PDF Link: http://arxiv.org/pdf/2502.03688v2
───────────────────────────────────────────
Entry: #3
Title: DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | arXiv Identifier: 2412.10302v1
Authors: Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan
Published Date: 2024-12-13 17:37:48+00:00
PDF Link: http://arxiv.org/pdf/2412.10302v1
───────────────────────────────────────────

Advanced Grouping Queries Combining Multiple Criteria

You can combine multiple queries using grouping and boolean operators. For example:

group_1 = Query().add(Field.author, "Stas Tiomkin").add(Field.author, "Daniel Polani", Opt.And)
group_2 = Query().add(Field.title, "Dynamic", Opt.Or).add(Field.submitted_date, DateRange("20240101", "20241231"), Opt.Or)
query = Query().add_group(group_1).add_group(group_2, Opt.And_Not)
result = query.get()
result.desc()

Output:

Page Entries: 1-1 | Total Entries : 1 | Pages: 1 / 1

To display the details of the grouped query result:

result.show()

Output:

Entries: 1-1/(1) | Pages: 1 / 1
───────────────────────────────────────────
Entry: #1
Title: AvE: Assistance via Empowerment | arXiv Identifier: 2006.14796v5
Authors: Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan
Published Date: 2020-06-26 04:40:11+00:00
PDF Link: http://arxiv.org/pdf/2006.14796v5
Abstract:
One difficulty in using artificial agents for human-assistive applications
lies in the challenge of accurately assisting with a person's goal(s). Existing
methods tend to rely on inferring the human's...
───────────────────────────────────────────

Efficient Pagination with Python Generators

The following example demonstrates how to iterate through paginated results using a Python generator:

q = Query(max_entries_per_pager=10).add(Field.title, "transformer").add(Field.submitted_date, DateRange("20230101", "20230110"), Opt.And)
for page in q.paginated_results():
    page.desc()

Output:

Page Entries: 1-10 | Total Entries : 65 | Pages: 1 / 7
Page Entries: 11-20 | Total Entries : 65 | Pages: 2 / 7
Page Entries: 21-30 | Total Entries : 65 | Pages: 3 / 7
Page Entries: 31-40 | Total Entries : 65 | Pages: 4 / 7
Page Entries: 41-50 | Total Entries : 65 | Pages: 5 / 7
Page Entries: 51-60 | Total Entries : 65 | Pages: 6 / 7
Page Entries: 61-65 | Total Entries : 65 | Pages: 7 / 7

This project is designed with simplicity and flexibility in mind, enabling engineers—whether or not they are familiar with the arXiv API—to easily build and execute arXiv queries. Contributions, bug reports, and suggestions for improvement are very welcome!

For more details on query syntax and the supported fields, please refer to the official arXiv API documentation.

Happy querying!


Appendix A: Field Enum Reference

Field Enum ArXiv API Prefix Explanation
Field.abstract abs Abstract
Field.author au Author
Field.all all All of the above
Field.category cat Subject Category
Field.comment co Comment
Field.id id Id (use id_list instead)
Field.journal_ref jr Journal Reference
Field.title ti Title
Field.rn rn Report Number
Field.submitted_date Submitted Date of the Paper

Appendix B: Field.category Values

For values that can be used with Field.category, please refer to the arXiv category taxonomy. Note that in this project, Field.category corresponds to the Category Enum class, which is built based on the current arXiv Category Taxonomy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_query_fluent-0.0.1.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv_query_fluent-0.0.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_query_fluent-0.0.1.tar.gz.

File metadata

  • Download URL: arxiv_query_fluent-0.0.1.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for arxiv_query_fluent-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9c0ba5e5ca686e6e0c15e15b061af0836470f3e41082043b3994f76487bcbc7e
MD5 e9810600d93cf3eaab57620c4aea76b0
BLAKE2b-256 1896c6a6bf9e58d3c5f6efdd94299c510151517ddf18917b242750f54b07b2ef

See more details on using hashes here.

File details

Details for the file arxiv_query_fluent-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for arxiv_query_fluent-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4990555862b3b3f2e48ff4b8ac385a79ce1aef2cacf2c7e3e3c9b1f57511b238
MD5 10dadc67c852304eb68df658d37fdbc5
BLAKE2b-256 72a40b38034c1f2c0b23d6e2a8139406ece21050b4469cc97dafa2eec877e1b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page