Scrapy extension for database ingestion with job/spider tracking
Project description
Scrapy Item Ingest
A comprehensive Scrapy extension for ingesting scraped items, requests, and logs into PostgreSQL databases with advanced tracking capabilities. This library provides a clean, production-ready solution for storing and monitoring your Scrapy crawling operations with real-time data ingestion and comprehensive logging.
Documentation
Full documentation is available at: https://scrapy-item-ingest.readthedocs.io/en/latest/
Key Features
- 🔄 Real-time Data Ingestion: Store items, requests, and logs as they're processed
- 📊 Request Tracking: Track request response times, fingerprints, and parent-child relationships
- 🔍 Comprehensive Logging: Capture spider events, errors, and custom messages
- 🏗️ Flexible Schema: Support for both auto-creation and existing table modes
- ⚙️ Modular Design: Use individual components or the complete pipeline
- 🛡️ Production Ready: Handles both development and production scenarios
- 📝 JSONB Storage: Store complex item data as JSONB for flexible querying
- 🐳 Docker Support: Complete containerization with Docker and Kubernetes
- 📈 Performance Optimized: Connection pooling and batch processing
- 🔧 Easy Configuration: Environment-based configuration with validation
- 📊 Monitoring Ready: Built-in metrics and health checks
Installation
pip install scrapy-item-ingest
Development
Setting up for Development
git clone https://github.com/fawadss1/scrapy_item_ingest.git
cd scrapy_item_ingest
pip install -e ".[dev]"
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
For support and questions:
- Email: fawadstar6@gmail.com
- Documentation: https://scrapy-item-ingest.readthedocs.io/
- Issues: Please report bugs and feature requests at GitHub Issues
Changelog
v0.1.0 (Current)
- Initial release
- Core pipeline functionality for items, requests, and logs
- PostgreSQL database integration with JSONB storage
- Comprehensive documentation and examples
- Production deployment guides
- Docker and Kubernetes support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_item_ingest-0.1.0.tar.gz.
File metadata
- Download URL: scrapy_item_ingest-0.1.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33949f480ef624ad3dd7213bd75d0611adc51449420b3ffa92b786a0e6b61327
|
|
| MD5 |
cdd9ec56d076e286bc28b662111cbacf
|
|
| BLAKE2b-256 |
0c4edfc6fb9c5de8895db9eb4a9121fa044d15b8b82be63fb6becaa379a6b8a1
|
File details
Details for the file scrapy_item_ingest-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_item_ingest-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
015a19148562c3ed914cbc89db58347e9fd422994d822dbe949e9924f8461740
|
|
| MD5 |
b07feaf9505e73e8c05963fe35dd0286
|
|
| BLAKE2b-256 |
26abe9e8d8e483867cafc98c787f17d9ca0ad5977241bbdbae9d4241fc63ef5a
|