"Automate your ArXiv paper search, retrieval, and summarization process."

These details have not been verified by PyPI

Project links

Project description

Description

arxiv_retriever is a lightweight command-line tool designed to automate the retrieval of computer science papers from ArXiv. The retrieval can be done using specified ArXiv computer science archive categories, full or partial titles of papers, if available, or links to the papers. Paper retrieval can be refined by author.

NOTE: My tests indicate that when searching for a really long title, using the partial title and then refining by author yields better results, as opposed to searching with the full title or even searching with the full title and refining by author. However, the tests are not exhaustive.

This tool is built using Python and leverages the Typer library for the command-line interface and the Python ElementTree XML package for parsing XML responses from the arXiv API. It can be useful for researchers, engineers, or students who want to quickly retrieve an ArXiv paper or keep abreast of latest research in their field without leaving their terminal/workstation.

Although my current focus while building arxiv_retriever is the computer science archive, it can be easily used with categories from other areas on arxiv, e.g., math.CO.

Features

Fetches the most recent papers from ArXiv by specified categories
Search for papers on ArXiv using full or partial title
Refine fetch and search by author for more precise results
View paper details including title, authors, abstract, publication date, and links to paper's abstract and pdf pages
Download papers after they are retrieved using fetch or search, or directly using download
Easy-to-use command-line interface built with Typer
Configurable number of results to fetch
Built using only the standard library and tried and tested packages.

Environment Setup

You can optionally set an environment variable (an OpenAI API key) before using the program. This is used to authenticate with OpenAI for the paper summarization feature. If you do not want your papers summarized, you will not need to set the environment variable. Specify your choice when asked by the CLI. Specifying 'y' without the KEY set will lead to an error.

Optional Environment Variable

Variable Name: OPENAI_API_KEY

Setting the Environment Variable

On Unix-like systems (Linux, macOS)

In your terminal, run:

export OPENAI_API_KEY=<key>

To ensure this works across all shell instances, add the above line to your shell configuration file (e.g., ~/.bashrc, ~/.zshrc, or ~/.profile).

On Windows

Open the Start menu and search for "Environment Variables"
Click on the "Edit system environment variables" option.
In the System Properties window, click on the "Environment Variables" button
Under "User variables", click "New"
Set the variable name as OPENAI_API_KEY and the value as your API key.

Verifying the Environment Variable

To verify the environment variable is set correctly:

On Unix-like systems:
```
  echo $OPENAI_API_KEY
```
On Windows (command prompt):
```
echo %OPENAI_API_KEY%
```

NOTE: Keep your API key confidential and do not share it publicly.

Installation

Install from PyPI (Recommended):

pip install --upgrade arxiv-retriever

Install from Source Distribution

If you need a specific version or want to install from a source distribution:

Download the source distribution (.tar.gz file) from PyPI or the GitHub releases page.
Install using pip:
```
pip install axiv-x.y.z.tar.gz
```
Replace x.y.z with the version number.

This method can be useful if you need a specific version or are in an environment without direct access to PyPI.

Install for Development and Testing

To install the latest development version from source:

Ensure you have Poetry installed. If not, install it by following the instructions at https://python-poetry.org/docs/#installation.

Clone the repository:

git clone https://github.com/MimicTester1307/arxiv_retriever.git
cd arxiv_retriever

Install the project and its dependencies:
```
poetry install
```
(Optional) To activate the virtual environment created by Poetry:
```
poetry shell
```
(Optional) Run tests to ensure everything is set up correctly:
```
poetry run pytest
```
Build the project:
```
poetry build
```

Install the wheel file using pip:

pip install dist/arxiv_retriever-1.0.0-py3-none-any.whl

Usage

After installation, use the package via the axiv command:

To view available commands:

axiv --help

To view arguments and options for available commands:

axiv <command> --help

Sample Usage

To retrieve the most recent computer science papers by categories, use the fetch command followed by the categories and options:

axiv fetch <categories> [--limit]

Outputs limit papers sorted by submittedDate in descending order

To filter results by author(s):

  axiv fetch <categories> [--limit] [--authors]

Outputs limit papers sorted by submittedDate in descending order, filtered by authors

To retrieve limit papers matching a specified title, use the search command followed by a title and options:

axiv search <title> [--limit]

Outputs limit papers sorted by relevance in descending order

To filter results by author(s):

  axiv search <title> [--limit] [--authors]

Outputs limit papers sorted by relevance in descending order, filtered by authors

Downloading your research papers

There are multiple ways to download your research paper using axiv:

use axiv download <link> [--download_dir] to download the paper directly from the link
confirm if you want to download the retrieved papers using fetch or search when asked by the CLI

With option 1, the file is named using the URL's basename, e.g. 2407.09298v1.pdf.

With options 2, the file is named using the title retrieved from the XML data when parsing.

NOTE: If the file name exists, it is overwritten.

Examples

Fetch the latest 5 papers in the cs.AI and cs.GL categories:

axiv fetch cs.AI cs.GL --limit 5

Fetch papers matching the title, "Attention is all you need", refined by author "Ashish":

axiv search "Attention is all you need" --limit 5 --authors "Ashish"

Download papers using links:

download using link to abstract:

    axiv download https://arxiv.org/abs/2407.20214v1

download using link to pdf:

axiv download https://arxiv.org/pdf/2407.20214v1

Note on Package and Command Names

Package Name: The package is named arxiv_retriever. This is the name you use when installing via pip or referring to the project.
Command Name: After installation, you interact with the tool using the axiv command in your terminal.

This distinction allows for a more concise command while maintaining a descriptive package name.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any features, bug fixes, or enhancements.

Note on Testing

Currently, 10 out of 11 tests pass, and even that required a bit of magic. Refactoring the tests for asynchrony was an interesting challenge. Discussions and contributions regarding the asynchronous implementation are particularly welcome.

You can contact me via email or leave a comment on the Notion project tracker.

License

This project is licensed under the MIT license. See the LICENSE file for more details.

Acknowledgements

Typer for the command-line interface
ElementTree for XML parsing
arXiv API for providing access to paper metadata via a well-designed API
Trio and HTTPx for the asynchronous features

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.1

Aug 7, 2024

1.3.0

Aug 7, 2024

This version

1.2.3

Jul 31, 2024

1.2.2

Jul 31, 2024

1.2.1

Jul 31, 2024

1.2.0

Jul 31, 2024

1.0.1.post1

Jul 24, 2024

1.0.1

Jul 24, 2024

1.0.0

Jul 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_retriever-1.2.3.tar.gz (12.2 kB view hashes)

Uploaded Jul 31, 2024 Source

Built Distribution

arxiv_retriever-1.2.3-py3-none-any.whl (11.6 kB view hashes)

Uploaded Jul 31, 2024 Python 3

Hashes for arxiv_retriever-1.2.3.tar.gz

Hashes for arxiv_retriever-1.2.3.tar.gz
Algorithm	Hash digest
SHA256	`1da2786d2bd1b0660e8b8e8f9300b52171f2f1d532c8cf44e8d6d27b50692deb`
MD5	`ea901a5ec9aef937f235b64210b7ab17`
BLAKE2b-256	`95a267148574ca2f03ea26dddd61c4229bb0623c11d42660774f0411604fa44b`

Hashes for arxiv_retriever-1.2.3-py3-none-any.whl

Hashes for arxiv_retriever-1.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b2dac99edbeafa9b76739ff48d2503847c2b0c65dc4d080559951423df45ac5b`
MD5	`aad1a989dc2d525a003a11e37f1059db`
BLAKE2b-256	`333753d5c7131dbd6e15c7d8bef473d07414167b353a1c79079bb6ff8b3e0fc0`