Skip to main content

Automatically collect papers from ArXiv and organize them in your Zotero library with AI-powered summarization

Project description

ArXiv-Zotero Connector

Python Version License: MIT PyPI Version

Automatically download, organize, and summarize arXiv papers directly into your Zotero library with AI-powered insights.

🚀 Features

  • Smart Search: Search arXiv by keywords, authors, categories, or date ranges
  • Auto-Download: Automatically download paper PDFs and attach them to Zotero entries
  • AI Summarization: Generate concise summaries using Google's Gemini AI (optional)
  • Metadata Extraction: Preserve complete paper metadata including authors, abstract, and publication details
  • Collection Support: Organize papers into specific Zotero collections
  • Flexible Filtering: Filter by journal papers, conference proceedings, or preprints
  • Batch Processing: Process multiple papers efficiently with progress tracking

📋 Requirements

  • Python 3.7 or higher
  • Zotero account with API access
  • Internet connection for downloading papers
  • Google AI API key (optional, for summarization features)

🔧 Installation

Install from PyPI

pip install arxiv-zotero-connector

Install from GitHub

pip install git+https://github.com/StepanKropachev/arxiv-zotero-connector.git

Development Installation

git clone https://github.com/StepanKropachev/arxiv-zotero-connector.git
cd arxiv-zotero-connector
pip install -e .

⚙️ Configuration

1. Get Zotero Credentials

  1. Library ID: Visit Zotero Settings → Your user ID for API calls
  2. API Key:
    • Go to Zotero Settings → New Private Key
    • Grant all permissions and save the key
  3. Collection Key (optional):
    • Open your Zotero web library
    • Navigate to desired collection
    • Copy the key from the URL: .../collections/XXXXXXXX

2. Create Configuration File

Create a .env file in your working directory:

ZOTERO_LIBRARY_ID=your_library_id
ZOTERO_API_KEY=your_api_key
COLLECTION_KEY=your_collection_key  # Optional
GOOGLE_API_KEY=your_gemini_api_key  # Optional, for AI summaries

📖 Usage

Command Line Interface

Basic search:

arxiv-zotero --keywords "machine learning" --max-results 10

Advanced search with filters:

arxiv-zotero \
  --keywords "transformer" "attention" \
  --categories cs.AI cs.LG \
  --start-date 2023-01-01 \
  --max-results 20

Search by author:

arxiv-zotero --author "Yoshua Bengio" --start-date 2023-06-01

Configuration File

Create search_config.yaml:

keywords:
  - "reinforcement learning"
  - "deep learning"
categories:
  - "cs.AI"
  - "cs.LG"
max_results: 50
start_date: "2023-01-01"
content_type: "journal"  # journal, conference, or preprint

# AI Summarization settings (optional)
summarizer:
  enabled: true
  prompt: "Summarize this paper in 3 key points"
  max_length: 300

Run with config:

arxiv-zotero --config search_config.yaml

Python API

from arxiv_zotero import ArxivZoteroCollector, ArxivSearchParams
import asyncio

async def main():
    # Initialize collector
    collector = ArxivZoteroCollector(
        zotero_library_id="your_library_id",
        zotero_api_key="your_api_key",
        collection_key="optional_collection_key"
    )
    
    # Configure search
    search_params = ArxivSearchParams(
        keywords=["quantum computing", "quantum algorithms"],
        categories=["quant-ph", "cs.CC"],
        max_results=10,
        start_date=datetime(2023, 1, 1)
    )
    
    # Run collection
    successful, failed = await collector.run_collection_async(
        search_params=search_params,
        download_pdfs=True
    )
    
    print(f"Processed {successful} papers successfully, {failed} failed")

asyncio.run(main())

🎯 Examples

Literature Review

arxiv-zotero \
  --keywords "neural architecture search" "AutoML" \
  --categories cs.LG \
  --content-type journal \
  --start-date 2022-01-01 \
  --max-results 100

Conference Papers

arxiv-zotero \
  --keywords "ICLR" "NeurIPS" \
  --content-type conference \
  --start-date 2023-01-01

Papers Without PDFs

arxiv-zotero --keywords "quantum" --no-pdf --max-results 50

🤖 AI Summarization

Enable AI-powered paper summaries by adding your Google AI API key:

arxiv-zotero \
  --keywords "large language models" \
  --summarizer-enabled \
  --summarizer-prompt "Explain this paper's contribution in simple terms" \
  --summary-length 500

📚 ArXiv Categories

Common categories include:

  • cs.AI: Artificial Intelligence
  • cs.LG: Machine Learning
  • cs.CL: Computation and Language
  • cs.CV: Computer Vision
  • stat.ML: Machine Learning (Statistics)
  • math.OC: Optimization and Control
  • quant-ph: Quantum Physics

Full list: arXiv Category Taxonomy

🛠️ Advanced Features

Custom Metadata Fields

The tool preserves:

  • Title, authors, abstract
  • Publication date and journal references
  • ArXiv ID and categories
  • DOI (when available)
  • Comments and version info

Rate Limiting

The tool respects arXiv's rate limits automatically. For large batch operations, consider using:

arxiv-zotero --keywords "your search" --rate-limit 5

Error Handling

Failed downloads are logged and can be retried:

  • Check arxiv_zotero.log for details
  • Papers are processed independently
  • Partial failures don't stop the entire batch

🐛 Troubleshooting

Common Issues

  1. "Collection not found": Verify your collection key or remove it to use the main library
  2. "API key invalid": Check your Zotero API key has proper permissions
  3. Import errors: Ensure all dependencies are installed: pip install -r requirements.txt
  4. PDF download fails: Check your internet connection and disk space

Debug Mode

For detailed logging:

import logging
logging.getLogger('arxiv_zotero').setLevel(logging.DEBUG)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • arXiv API for providing access to paper metadata
  • Zotero for the excellent reference management platform
  • Google Gemini for AI summarization capabilities

📬 Support

📈 Changelog

Version 0.1.0 (2024-06-17)

  • Initial release
  • Core functionality for searching and collecting arXiv papers
  • Zotero integration with metadata preservation
  • AI-powered summarization support
  • Command-line interface and Python API

Made with ❤️ by Stepan Kropachev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_zotero_connector-0.1.0.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv_zotero_connector-0.1.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_zotero_connector-0.1.0.tar.gz.

File metadata

  • Download URL: arxiv_zotero_connector-0.1.0.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for arxiv_zotero_connector-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4b972a7e180a8f15dd728485fe04f6bf152b9cc25b74205b8c70de5e69535bec
MD5 634824162443cdd976554804458fdde1
BLAKE2b-256 828f9e1bf2d9e79b30b55849a9af0ff89f21786a73cd9d2e4e8e9459924a98c0

See more details on using hashes here.

File details

Details for the file arxiv_zotero_connector-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for arxiv_zotero_connector-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 620a4a6fc7b43d6da24eccf18e14c1075dcbffccc5a7991969307f5a84c564d9
MD5 33554f4ceb1f63871e515ea6037239f9
BLAKE2b-256 b306272ff53763933792dc0094138eb89604748638b976c04b84f062587ec437

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page