Automatically collect papers from ArXiv and organize them in your Zotero library with AI-powered summarization
Project description
ArXiv-Zotero Connector
Automatically download, organize, and summarize arXiv papers directly into your Zotero library with AI-powered insights.
🚀 Features
- Smart Search: Search arXiv by keywords, authors, categories, or date ranges
- Auto-Download: Automatically download paper PDFs and attach them to Zotero entries
- AI Summarization: Generate concise summaries using Google's Gemini AI (optional)
- Metadata Extraction: Preserve complete paper metadata including authors, abstract, and publication details
- Collection Support: Organize papers into specific Zotero collections
- Flexible Filtering: Filter by journal papers, conference proceedings, or preprints
- Batch Processing: Process multiple papers efficiently with progress tracking
📋 Requirements
- Python 3.7 or higher
- Zotero account with API access
- Internet connection for downloading papers
- Google AI API key (optional, for summarization features)
🔧 Installation
Install from PyPI
pip install arxiv-zotero-connector
Install from GitHub
pip install git+https://github.com/StepanKropachev/arxiv-zotero-connector.git
Development Installation
git clone https://github.com/StepanKropachev/arxiv-zotero-connector.git
cd arxiv-zotero-connector
pip install -e .
⚙️ Configuration
1. Get Zotero Credentials
- Library ID: Visit Zotero Settings → Your user ID for API calls
- API Key:
- Go to Zotero Settings → New Private Key
- Grant all permissions and save the key
- Collection Key (optional):
- Open your Zotero web library
- Navigate to desired collection
- Copy the key from the URL:
.../collections/XXXXXXXX
2. Create Configuration File
Create a .env file in your working directory:
ZOTERO_LIBRARY_ID=your_library_id
ZOTERO_API_KEY=your_api_key
COLLECTION_KEY=your_collection_key # Optional
GOOGLE_API_KEY=your_gemini_api_key # Optional, for AI summaries
📖 Usage
Command Line Interface
Basic search:
arxiv-zotero --keywords "machine learning" --max-results 10
Advanced search with filters:
arxiv-zotero \
--keywords "transformer" "attention" \
--categories cs.AI cs.LG \
--start-date 2023-01-01 \
--max-results 20
Search by author:
arxiv-zotero --author "Yoshua Bengio" --start-date 2023-06-01
Configuration File
Create search_config.yaml:
keywords:
- "reinforcement learning"
- "deep learning"
categories:
- "cs.AI"
- "cs.LG"
max_results: 50
start_date: "2023-01-01"
content_type: "journal" # journal, conference, or preprint
# AI Summarization settings (optional)
summarizer:
enabled: true
prompt: "Summarize this paper in 3 key points"
max_length: 300
Run with config:
arxiv-zotero --config search_config.yaml
Python API
from arxiv_zotero import ArxivZoteroCollector, ArxivSearchParams
import asyncio
async def main():
# Initialize collector
collector = ArxivZoteroCollector(
zotero_library_id="your_library_id",
zotero_api_key="your_api_key",
collection_key="optional_collection_key"
)
# Configure search
search_params = ArxivSearchParams(
keywords=["quantum computing", "quantum algorithms"],
categories=["quant-ph", "cs.CC"],
max_results=10,
start_date=datetime(2023, 1, 1)
)
# Run collection
successful, failed = await collector.run_collection_async(
search_params=search_params,
download_pdfs=True
)
print(f"Processed {successful} papers successfully, {failed} failed")
asyncio.run(main())
🎯 Examples
Literature Review
arxiv-zotero \
--keywords "neural architecture search" "AutoML" \
--categories cs.LG \
--content-type journal \
--start-date 2022-01-01 \
--max-results 100
Conference Papers
arxiv-zotero \
--keywords "ICLR" "NeurIPS" \
--content-type conference \
--start-date 2023-01-01
Papers Without PDFs
arxiv-zotero --keywords "quantum" --no-pdf --max-results 50
🤖 AI Summarization
Enable AI-powered paper summaries by adding your Google AI API key:
arxiv-zotero \
--keywords "large language models" \
--summarizer-enabled \
--summarizer-prompt "Explain this paper's contribution in simple terms" \
--summary-length 500
📚 ArXiv Categories
Common categories include:
- cs.AI: Artificial Intelligence
- cs.LG: Machine Learning
- cs.CL: Computation and Language
- cs.CV: Computer Vision
- stat.ML: Machine Learning (Statistics)
- math.OC: Optimization and Control
- quant-ph: Quantum Physics
Full list: arXiv Category Taxonomy
🛠️ Advanced Features
Custom Metadata Fields
The tool preserves:
- Title, authors, abstract
- Publication date and journal references
- ArXiv ID and categories
- DOI (when available)
- Comments and version info
Rate Limiting
The tool respects arXiv's rate limits automatically. For large batch operations, consider using:
arxiv-zotero --keywords "your search" --rate-limit 5
Error Handling
Failed downloads are logged and can be retried:
- Check
arxiv_zotero.logfor details - Papers are processed independently
- Partial failures don't stop the entire batch
🐛 Troubleshooting
Common Issues
- "Collection not found": Verify your collection key or remove it to use the main library
- "API key invalid": Check your Zotero API key has proper permissions
- Import errors: Ensure all dependencies are installed:
pip install -r requirements.txt - PDF download fails: Check your internet connection and disk space
Debug Mode
For detailed logging:
import logging
logging.getLogger('arxiv_zotero').setLevel(logging.DEBUG)
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- arXiv API for providing access to paper metadata
- Zotero for the excellent reference management platform
- Google Gemini for AI summarization capabilities
📬 Support
- 📧 Email: your-email@example.com
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
📈 Changelog
Version 0.1.0 (2024-06-17)
- Initial release
- Core functionality for searching and collecting arXiv papers
- Zotero integration with metadata preservation
- AI-powered summarization support
- Command-line interface and Python API
Made with ❤️ by Stepan Kropachev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arxiv_zotero_connector-0.1.0.tar.gz.
File metadata
- Download URL: arxiv_zotero_connector-0.1.0.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b972a7e180a8f15dd728485fe04f6bf152b9cc25b74205b8c70de5e69535bec
|
|
| MD5 |
634824162443cdd976554804458fdde1
|
|
| BLAKE2b-256 |
828f9e1bf2d9e79b30b55849a9af0ff89f21786a73cd9d2e4e8e9459924a98c0
|
File details
Details for the file arxiv_zotero_connector-0.1.0-py3-none-any.whl.
File metadata
- Download URL: arxiv_zotero_connector-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
620a4a6fc7b43d6da24eccf18e14c1075dcbffccc5a7991969307f5a84c564d9
|
|
| MD5 |
33554f4ceb1f63871e515ea6037239f9
|
|
| BLAKE2b-256 |
b306272ff53763933792dc0094138eb89604748638b976c04b84f062587ec437
|