Page Object pattern for Scrapy
Project description
scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.
Read the documentation for more information.
License is BSD 3-clause.
Documentation: https://scrapy-poet.readthedocs.io
Source code: https://github.com/scrapinghub/scrapy-poet
Issue tracker: https://github.com/scrapinghub/scrapy-poet/issues
Quick Start
Installation
pip install scrapy-poet
Requires Python 3.8+ and Scrapy >= 2.6.0.
Usage in a Scrapy Project
Add the following inside Scrapy’s settings.py file:
DOWNLOADER_MIDDLEWARES = {
"scrapy_poet.InjectionMiddleware": 543,
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
"scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
"scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
Developing
Setup your local Python environment via:
pip install -r requirements-dev.txt
pre-commit install
Now everytime you perform a git commit, these tools will run against the staged files:
black
isort
flake8
You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_poet-0.23.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff6a1f62a25cf2b7545778e75b7dada8b8e2b4895607a2aefc835d8111fcc680 |
|
MD5 | c2f9d1d34a0a86f45e9e566a92bd75b4 |
|
BLAKE2b-256 | 3b17a6e9bbdf367e4d865dcf54377bd00a5eff5555ec87f4472485e11aa4dd1f |