Skip to main content

Automate arXiv paper tracking with LLM-powered metadata extraction and Google Sheets sync.

Project description

arXivFlow 🚀

License: MIT Python 3.13+ Static Badge Ollama arXiv

arXivFlow is a powerful Python-based automation tool designed to streamline the research paper discovery and tracking process. It autonomously fetches metadata from arXiv, performs local AI-driven analysis using Ollama (Llama 3.2), and synchronizes the results with Google Sheets and local databases.


✨ Features

  • Automated Retrieval: Fetch the latest papers from specific arXiv categories (e.g., cs.AI, cs.LG, hep-ph) within any date range.
  • Local AI Analysis: Uses Ollama (Llama 3.2) to extract keywords and contact information (emails/affiliations) directly from PDF text. No cloud API costs or data privacy concerns.
  • Intelligent PDF Handling: Automatically downloads PDFs and extracts text for deep analysis. Supports custom storage paths.
  • Multi-Format Export: Save your research data to CSV, JSON, Excel, or SQLite for flexible offline analysis.
  • Google Sheets Sync: Seamlessly push compiled research data to a shared Google Sheet for team collaboration.
  • Type-Safe & Modular: Clean, documented Python code with full type hinting and a class-based architecture.

🛠️ Prerequisites

  1. Python 3.13+: Ensure you have a modern Python environment.
  2. Ollama: Install Ollama and download the required model:
    ollama pull llama3.2
    
  3. Google Cloud Credentials:
    • Enable the Google Sheets and Google Drive APIs.
    • Create a Service Account and download the JSON key as credentials.json.
    • Ensure the service account has 'Editor' permissions on the sheet.

🚀 Installation

From PyPI (Recommended)

pip install arxivflow

From Source (For Development)

  1. Clone the repository:

    git clone https://github.com/zjzhao/arXivFlow.git
    cd arXivFlow
    
  2. Set up virtual environment:

    python -m venv .
    source bin/activate  # On Windows: Scripts\activate
    
  3. Install dependencies:

    pip install -e .
    

📖 Usage

Quick Start

from arxivflow import arXivFlow
import datetime

# 1. Initialize the flow
flow = arXivFlow(
    categories=["cs.AI", "cs.CV"], 
    ollama_model="llama3.2",
    max_results=20,
    start_date=datetime.datetime.now() - datetime.timedelta(days=7)
)

# 2. Fetch data & Extract info (Keywords/Contacts)
df = flow.get_arxiv_data(download_pdfs=True)

# 3. Save to your preferred formats
flow.save_to_csv("my_research.csv")
flow.save_to_sqlite("research.db")

# 4. Sync with Google Sheets
flow.save_to_google_sheet(
    sheet_id="YOUR_SHEET_ID", 
    credentials_file="credentials.json"
)

🏗️ Architecture

The project follows a modular structure for easy extension:

  • src/arxivflow/arxivflow.py: The main orchestrator class (arXivFlow).
  • src/arxivflow/ollama_functions.py: Local LLM interface using the Ollama API.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxivflow-0.1.1.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxivflow-0.1.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file arxivflow-0.1.1.tar.gz.

File metadata

  • Download URL: arxivflow-0.1.1.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arxivflow-0.1.1.tar.gz
Algorithm Hash digest
SHA256 96bae5a4aeb05430edf50d846927392ae15e408be2f52e65448d164c03f3d00c
MD5 1db4cec268c385a416d015f3c0e3d657
BLAKE2b-256 d08b4edfa2452f9db8a535e9c842fc8859da6905a022c7cadd23e18b10eefc2f

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxivflow-0.1.1.tar.gz:

Publisher: python-publish.yml on zjzhao1002/arXivFlow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arxivflow-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: arxivflow-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arxivflow-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 99a72afb81b8be0cd045b10186bfdb43fd9f699416713de5a3db39e241d43836
MD5 27d903f70fc8cdd92b03fb5013f86e21
BLAKE2b-256 b4a7f2b7fe5b092f9d34c4099da81b2d12d6d78b40d25808c0df9e674fc4dd31

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxivflow-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on zjzhao1002/arXivFlow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page