Skip to main content

Page Object pattern for Scrapy

Project description

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Requires Python 3.10+ and Scrapy >= 2.6.0.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Usage in a Scrapy Project

Add the following inside Scrapy’s settings.py file:

  • Scrapy ≥ 2.10:

    ADDONS = {
        "scrapy_poet.Addon": 300,
    }
  • Scrapy < 2.10:

    DOWNLOADER_MIDDLEWARES = {
        "scrapy_poet.InjectionMiddleware": 543,
        "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
        "scrapy_poet.DownloaderStatsMiddleware": 850,
    }
    REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
    SPIDER_MIDDLEWARES = {
        "scrapy_poet.RetryMiddleware": 275,
    }

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt

  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black

  • isort

  • flake8

You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_poet-0.27.0.tar.gz (70.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_poet-0.27.0-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_poet-0.27.0.tar.gz.

File metadata

  • Download URL: scrapy_poet-0.27.0.tar.gz
  • Upload date:
  • Size: 70.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapy_poet-0.27.0.tar.gz
Algorithm Hash digest
SHA256 8e92d8c9a7c301cf7c0bde32e9a50513fa1f6dff5cbbf4e4dd42c98f0beb022a
MD5 ace4c7ab6cf14ca1c05aa673b1df7205
BLAKE2b-256 e81bab0c9caa35556faa7f8a88e38868cc7977727c74a5f7141e92cec495f1ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_poet-0.27.0.tar.gz:

Publisher: publish.yml on scrapinghub/scrapy-poet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapy_poet-0.27.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_poet-0.27.0-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapy_poet-0.27.0-py3-none-any.whl
Algorithm Hash digest
SHA256 564b129c94775fbf6a00b4d7e3685623b4d6fd3d01626336cedfd02433d75976
MD5 9e5a4748ea1d2a21c5b0c329a175be8e
BLAKE2b-256 d9208349635b39a3bc87d2538f5a63231fe5d7eab914c380592ba9b4e5419490

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_poet-0.27.0-py3-none-any.whl:

Publisher: publish.yml on scrapinghub/scrapy-poet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page