Automatically collect papers from ArXiv and organize them in your Zotero library with AI-powered summarization

Project description

ArXiv-Zotero Connector

Automatically download, organize, and summarize arXiv papers directly into your Zotero library with AI-powered insights.

🚀 Features

Smart Search: Search arXiv by keywords, authors, categories, or date ranges
Auto-Download: Automatically download paper PDFs and attach them to Zotero entries
AI Summarization: Generate concise summaries using Google's Gemini AI (optional)
Metadata Extraction: Preserve complete paper metadata including authors, abstract, and publication details
Collection Support: Organize papers into specific Zotero collections
Flexible Filtering: Filter by journal papers, conference proceedings, or preprints
Batch Processing: Process multiple papers efficiently with progress tracking

📋 Requirements

Python 3.7 or higher
Zotero account with API access
Internet connection for downloading papers
Google AI API key (optional, for summarization features)

🔧 Installation

Install from PyPI

pip install arxiv-zotero-connector

Install from GitHub

pip install git+https://github.com/StepanKropachev/arxiv-zotero-connector.git

Development Installation

git clone https://github.com/StepanKropachev/arxiv-zotero-connector.git
cd arxiv-zotero-connector
pip install -e .

⚙️ Configuration

1. Get Zotero Credentials

Library ID: Visit Zotero Settings → Your user ID for API calls
API Key:
- Go to Zotero Settings → New Private Key
- Grant all permissions and save the key
Collection Key (optional):
- Open your Zotero web library
- Navigate to desired collection
- Copy the key from the URL: .../collections/XXXXXXXX

2. Create Configuration File

Create a .env file in your working directory:

ZOTERO_LIBRARY_ID=your_library_id
ZOTERO_API_KEY=your_api_key
COLLECTION_KEY=your_collection_key  # Optional
GOOGLE_API_KEY=your_gemini_api_key  # Optional, for AI summaries

📖 Usage

Command Line Interface

Basic search:

arxiv-zotero --keywords "machine learning" --max-results 10

Advanced search with filters:

arxiv-zotero \
  --keywords "transformer" "attention" \
  --categories cs.AI cs.LG \
  --start-date 2023-01-01 \
  --max-results 20

Search by author:

arxiv-zotero --author "Yoshua Bengio" --start-date 2023-06-01

Configuration File

Create search_config.yaml:

keywords:
  - "reinforcement learning"
  - "deep learning"
categories:
  - "cs.AI"
  - "cs.LG"
max_results: 50
start_date: "2023-01-01"
content_type: "journal"  # journal, conference, or preprint

# AI Summarization settings (optional)
summarizer:
  enabled: true
  prompt: "Summarize this paper in 3 key points"
  max_length: 300

Run with config:

arxiv-zotero --config search_config.yaml

Python API

from arxiv_zotero import ArxivZoteroCollector, ArxivSearchParams
import asyncio

async def main():
    # Initialize collector
    collector = ArxivZoteroCollector(
        zotero_library_id="your_library_id",
        zotero_api_key="your_api_key",
        collection_key="optional_collection_key"
    )
    
    # Configure search
    search_params = ArxivSearchParams(
        keywords=["quantum computing", "quantum algorithms"],
        categories=["quant-ph", "cs.CC"],
        max_results=10,
        start_date=datetime(2023, 1, 1)
    )
    
    # Run collection
    successful, failed = await collector.run_collection_async(
        search_params=search_params,
        download_pdfs=True
    )
    
    print(f"Processed {successful} papers successfully, {failed} failed")

asyncio.run(main())

🎯 Examples

Literature Review

arxiv-zotero \
  --keywords "neural architecture search" "AutoML" \
  --categories cs.LG \
  --content-type journal \
  --start-date 2022-01-01 \
  --max-results 100

Conference Papers

arxiv-zotero \
  --keywords "ICLR" "NeurIPS" \
  --content-type conference \
  --start-date 2023-01-01

Papers Without PDFs

arxiv-zotero --keywords "quantum" --no-pdf --max-results 50

🤖 AI Summarization

Enable AI-powered paper summaries by adding your Google AI API key:

arxiv-zotero \
  --keywords "large language models" \
  --summarizer-enabled \
  --summarizer-prompt "Explain this paper's contribution in simple terms" \
  --summary-length 500

📚 ArXiv Categories

Common categories include:

cs.AI: Artificial Intelligence
cs.LG: Machine Learning
cs.CL: Computation and Language
cs.CV: Computer Vision
stat.ML: Machine Learning (Statistics)
math.OC: Optimization and Control
quant-ph: Quantum Physics

Full list: arXiv Category Taxonomy

🛠️ Advanced Features

Custom Metadata Fields

The tool preserves:

Title, authors, abstract
Publication date and journal references
ArXiv ID and categories
DOI (when available)
Comments and version info

Rate Limiting

The tool respects arXiv's rate limits automatically. For large batch operations, consider using:

arxiv-zotero --keywords "your search" --rate-limit 5

Error Handling

Failed downloads are logged and can be retried:

Check arxiv_zotero.log for details
Papers are processed independently
Partial failures don't stop the entire batch

🐛 Troubleshooting

Common Issues

"Collection not found": Verify your collection key or remove it to use the main library
"API key invalid": Check your Zotero API key has proper permissions
Import errors: Ensure all dependencies are installed: pip install -r requirements.txt
PDF download fails: Check your internet connection and disk space

Debug Mode

For detailed logging:

import logging
logging.getLogger('arxiv_zotero').setLevel(logging.DEBUG)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

arXiv API for providing access to paper metadata
Zotero for the excellent reference management platform
Google Gemini for AI summarization capabilities

📬 Support

📧 Email: your-email@example.com
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

📈 Changelog

Version 0.1.0 (2024-06-17)

Initial release
Core functionality for searching and collecting arXiv papers
Zotero integration with metadata preservation
AI-powered summarization support
Command-line interface and Python API

Made with ❤️ by Stepan Kropachev

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_zotero_connector-0.1.0.tar.gz (28.3 kB view details)

Uploaded Jun 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arxiv_zotero_connector-0.1.0-py3-none-any.whl (25.1 kB view details)

Uploaded Jun 17, 2025 Python 3

File details

Details for the file arxiv_zotero_connector-0.1.0.tar.gz.

File metadata

Download URL: arxiv_zotero_connector-0.1.0.tar.gz
Upload date: Jun 17, 2025
Size: 28.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for arxiv_zotero_connector-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4b972a7e180a8f15dd728485fe04f6bf152b9cc25b74205b8c70de5e69535bec`
MD5	`634824162443cdd976554804458fdde1`
BLAKE2b-256	`828f9e1bf2d9e79b30b55849a9af0ff89f21786a73cd9d2e4e8e9459924a98c0`

See more details on using hashes here.

File details

Details for the file arxiv_zotero_connector-0.1.0-py3-none-any.whl.

File metadata

Download URL: arxiv_zotero_connector-0.1.0-py3-none-any.whl
Upload date: Jun 17, 2025
Size: 25.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for arxiv_zotero_connector-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`620a4a6fc7b43d6da24eccf18e14c1075dcbffccc5a7991969307f5a84c564d9`
MD5	`33554f4ceb1f63871e515ea6037239f9`
BLAKE2b-256	`b306272ff53763933792dc0094138eb89604748638b976c04b84f062587ec437`

See more details on using hashes here.

arxiv-zotero-connector 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ArXiv-Zotero Connector

🚀 Features

📋 Requirements

🔧 Installation

Install from PyPI

Install from GitHub

Development Installation

⚙️ Configuration

1. Get Zotero Credentials

2. Create Configuration File

📖 Usage

Command Line Interface

Configuration File

Python API

🎯 Examples

Literature Review

Conference Papers

Papers Without PDFs

🤖 AI Summarization

📚 ArXiv Categories

🛠️ Advanced Features

Custom Metadata Fields

Rate Limiting

Error Handling

🐛 Troubleshooting

Common Issues

Debug Mode

🤝 Contributing

📄 License

🙏 Acknowledgments

📬 Support

📈 Changelog

Version 0.1.0 (2024-06-17)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes