Scrapy extension for database ingestion with job/spider tracking
Project description
Scrapy Item Ingest
A tiny, straightforward addon for Scrapy that saves your items, requests, and logs to PostgreSQL. No boilerplate, no ceremony.
Install
pip install scrapy-item-ingest
Minimal setup (settings.py)
ITEM_PIPELINES = {
'scrapy_item_ingest.DbInsertPipeline': 300,
}
EXTENSIONS = {
'scrapy_item_ingest.LoggingExtension': 500,
}
# Pick ONE of the two database config styles:
DB_URL = "postgresql://user:password@localhost:5432/database"
# Or use discrete fields (avoids URL encoding):
# DB_HOST = "localhost"
# DB_PORT = 5432
# DB_USER = "user"
# DB_PASSWORD = "password"
# DB_NAME = "database"
# Optional
CREATE_TABLES = True # auto‑create tables on first run (default True)
JOB_ID = 1 # or omit; spider name will be used
Run your spider:
scrapy crawl your_spider
Troubleshooting
- Password has special characters like
@or$?- In a URL, encode them:
@->%40,$->%24. - Example:
postgresql://user:PAK%40swat1%24@localhost:5432/db - Or use the discrete fields (no encoding needed).
- In a URL, encode them:
Useful settings (optional)
LOG_DB_LEVEL(default:DEBUG) — minimum level stored in DBLOG_DB_CAPTURE_LEVEL— capture level for Scrapy loggers routed to DB (does not affect console)LOG_DB_LOGGERS— allowed logger prefixes (defaults always include[spider.name, 'scrapy'])LOG_DB_EXCLUDE_LOGGERS(default:['scrapy.core.scraper'])LOG_DB_EXCLUDE_PATTERNS(default:['Scraped from <'])CREATE_TABLES(default:True) — createjob_items,job_requests,job_logson startupITEMS_TABLE,REQUESTS_TABLE,LOGS_TABLE— override table names
Links
- Docs: https://scrapy-item-ingest.readthedocs.io/
- Changelog: docs/development/changelog.rst
- Issues: https://github.com/fawadss1/scrapy_item_ingest/issues
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy_item_ingest-0.2.4.tar.gz
(15.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_item_ingest-0.2.4.tar.gz.
File metadata
- Download URL: scrapy_item_ingest-0.2.4.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffaba06f7a513ef04a99c89bdeaefa38c70eecd3191d8a7e2c0fe23e328057b2
|
|
| MD5 |
c5c8c5583a511b02125cacebf2d70c45
|
|
| BLAKE2b-256 |
f0d9a681108542c38f7b5f81b3d167a7eae109fe761d5e7add9b0d520259eb9b
|
File details
Details for the file scrapy_item_ingest-0.2.4-py3-none-any.whl.
File metadata
- Download URL: scrapy_item_ingest-0.2.4-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdc5ea25205777b2ad4aee0fb348837499daa4e0edcc9a6e3c2fd7599db1fa00
|
|
| MD5 |
a3c40e95430439853aa59e69f2066f1d
|
|
| BLAKE2b-256 |
57ad4350ea4afc5d65cd5c8c8aba30fd762f849783f52a38f6d077e1176cda27
|