Automate arXiv paper tracking with LLM-powered metadata extraction and Google Sheets sync.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

arXivFlow

arXivFlow is a Python-based automation tool designed to fetch research paper metadata from arXiv, extract keywords and contact information using local LLMs (Ollama), and synchronize the results with Google Sheets.

Project Overview

Purpose: Automates the tracking and processing of new research papers. It fetches data for specified arXiv categories, uses Ollama (Llama 3.2) to summarize and extract keywords/contact info from PDFs, and uploads the compiled data to a Google Sheet.
Main Technologies:
- Python 3.13: Core language.
- arxiv: Library for querying the arXiv API.
- Ollama (Llama 3.2): Local LLM for intelligent extraction.
- PyMuPDF: PDF text extraction for contact information retrieval.
- pandas: Data manipulation and export to CSV, Excel, JSON, and SQLite.
- gspread: Google Sheets API interaction.

Architecture & Key Files

The project follows a modular structure located in src/arxivflow/.

Core Modules

src/arxivflow/arxivflow.py: Contains the arXivFlow class, which orchestrates the entire workflow:
- Querying arXiv for specific categories and date ranges.
- Downloading PDFs to the pdfs/ directory.
- Processing results and extracting information.
- Saving data to CSV, JSON, Excel, SQLite, or Google Sheets.
src/arxivflow/ollama_functions.py: Contains the OllamaFunctions class for interacting with the local Ollama API to extract keywords and contact details.

Configuration & Data

user_input.json: Configures the target Google Sheet ID, CSV filename, and credentials path.
credentials.json: (User-provided) Google Service Account credentials.
requirements.txt: Project dependencies.
pdfs/: Local directory where downloaded research papers are stored.

Building and Running

Prerequisites

Python 3.13+: Ensure Python is installed.
Ollama: Install Ollama and pull the required model:
```
ollama pull llama3.2
```
Google Cloud Setup:
- Enable Google Sheets and Google Drive APIs.
- Create a Service Account and save the JSON key as credentials.json.
- Share the target Google Sheet with the Service Account email.

Setup

Create and activate a virtual environment:

python -m venv .
source bin/activate  # On Windows: Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

The arXivFlow class can be used as follows:

from arxivflow import arXivFlow
import datetime

# Initialize with categories and optional Ollama model
flow = arXivFlow(
    categories=["cs.AI", "cs.LG"], 
    ollama_model="llama3.2",
    max_results=50,
    start_date=datetime.datetime.now() - datetime.timedelta(days=3)
)

# Optional: Set a custom path for PDF downloads
flow.set_pdfs_path("my_papers")

# Fetch data and optionally download PDFs for contact extraction
df = flow.get_arxiv_data(download_pdfs=True)

# Save to multiple formats
flow.save_to_csv("results.csv")
flow.save_to_json("results.json")
flow.save_to_excel("results.xlsx")
flow.save_to_sqlite("results.db")

# Sync with Google Sheets
flow.save_to_google_sheet(
    sheet_id="YOUR_SHEET_ID", 
    credentials_file="credentials.json"
)

Development Conventions

Modular Logic: All core functionality resides in src/arxivflow/.
Local AI: Keyword and contact extraction are performed locally using Ollama to ensure privacy and eliminate API costs. The tool automatically handles model verification and pulling.
Data Persistence: Supports multiple export formats (CSV, JSON, Excel, SQLite) for flexibility.
Type Hinting: The codebase uses Python type hints for better maintainability and clarity.
Configurable PDF Handling: PDFs can be optionally downloaded and stored in custom directories.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.1

May 1, 2026

This version

0.1.0

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxivflow-0.1.0.tar.gz (13.3 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arxivflow-0.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file arxivflow-0.1.0.tar.gz.

File metadata

Download URL: arxivflow-0.1.0.tar.gz
Upload date: May 1, 2026
Size: 13.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arxivflow-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`83d57ff38552eeb95dc6935cbb598b0d59969eca495fa58596377d5bb546f27a`
MD5	`d0d8d056e5078a91a6e61d74d62fadb1`
BLAKE2b-256	`b5d409b3c889a6134cb05336d128b6c2776901af6ed7866da319ef9cdb3a4eba`

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxivflow-0.1.0.tar.gz:

Publisher: python-publish.yml on zjzhao1002/arXivFlow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: arxivflow-0.1.0.tar.gz
- Subject digest: 83d57ff38552eeb95dc6935cbb598b0d59969eca495fa58596377d5bb546f27a
- Sigstore transparency entry: 1417091796
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: zjzhao1002/arXivFlow@d672f1d218ab3778b9e43c34c9c2876c6720a1b2
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/zjzhao1002
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@d672f1d218ab3778b9e43c34c9c2876c6720a1b2
- Trigger Event: release

File details

Details for the file arxivflow-0.1.0-py3-none-any.whl.

File metadata

Download URL: arxivflow-0.1.0-py3-none-any.whl
Upload date: May 1, 2026
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arxivflow-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89cf5b0ad3e631a9a9d03ccb03fc5cf05a7e95e877fcb20c38cf7a7de5de6717`
MD5	`4f5fe96c52956e04ac28981952070463`
BLAKE2b-256	`0dcf8668c1b9b35f5227cfb13f4b391536d10a927cf16405e42423ec90c7c361`

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxivflow-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on zjzhao1002/arXivFlow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: arxivflow-0.1.0-py3-none-any.whl
- Subject digest: 89cf5b0ad3e631a9a9d03ccb03fc5cf05a7e95e877fcb20c38cf7a7de5de6717
- Sigstore transparency entry: 1417091798
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: zjzhao1002/arXivFlow@d672f1d218ab3778b9e43c34c9c2876c6720a1b2
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/zjzhao1002
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@d672f1d218ab3778b9e43c34c9c2876c6720a1b2
- Trigger Event: release

arxivflow 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

arXivFlow

Project Overview

Architecture & Key Files

Core Modules

Configuration & Data

Building and Running

Prerequisites

Setup

Usage

Development Conventions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance