Skip to main content

Page Object pattern for Scrapy

Project description

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Requires Python 3.8+ and Scrapy >= 2.6.0.

Usage in a Scrapy Project

Add the following inside Scrapy’s settings.py file:

DOWNLOADER_MIDDLEWARES = {
    "scrapy_poet.InjectionMiddleware": 543,
    "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
    "scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
    "scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt

  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black

  • isort

  • flake8

You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_poet-0.22.3.tar.gz (57.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_poet-0.22.3-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_poet-0.22.3.tar.gz.

File metadata

  • Download URL: scrapy_poet-0.22.3.tar.gz
  • Upload date:
  • Size: 57.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for scrapy_poet-0.22.3.tar.gz
Algorithm Hash digest
SHA256 ebef9add38dfd10950c8904e9e342c087ce09201e84b15dfac42fb7c505e4535
MD5 97e9e0337805cfc23810b97ab575438f
BLAKE2b-256 1ad3b8e3c4def03452550cc020905efe247aba0ea035c41d2a8980cb95000f0b

See more details on using hashes here.

File details

Details for the file scrapy_poet-0.22.3-py3-none-any.whl.

File metadata

  • Download URL: scrapy_poet-0.22.3-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for scrapy_poet-0.22.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f7e145050dec0811a622e56b2a69ca4e66a6a940e9ec3876709ca1435fff84f2
MD5 3e80de39e5475ba4507419ad14c2a6eb
BLAKE2b-256 64763df61e1f8c13bb86804eb27b420bb1c3e86423bbd52e493954a5188a9c2b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page