Skip to main content

Page Object pattern for Scrapy

Project description

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Requires Python 3.9+ and Scrapy >= 2.6.0.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Usage in a Scrapy Project

Add the following inside Scrapy’s settings.py file:

  • Scrapy ≥ 2.10:

    ADDONS = {
        "scrapy_poet.Addon": 300,
    }
  • Scrapy < 2.10:

    DOWNLOADER_MIDDLEWARES = {
        "scrapy_poet.InjectionMiddleware": 543,
        "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
        "scrapy_poet.DownloaderStatsMiddleware": 850,
    }
    REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
    SPIDER_MIDDLEWARES = {
        "scrapy_poet.RetryMiddleware": 275,
    }

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt

  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black

  • isort

  • flake8

You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_poet-0.26.0.tar.gz (68.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_poet-0.26.0-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_poet-0.26.0.tar.gz.

File metadata

  • Download URL: scrapy_poet-0.26.0.tar.gz
  • Upload date:
  • Size: 68.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for scrapy_poet-0.26.0.tar.gz
Algorithm Hash digest
SHA256 fd26312a76019697466f4adde0020c1f2a613b439f7795638762fbcad0a905c9
MD5 344b72212fa9c89c5d6f03b6afc0533d
BLAKE2b-256 ba0276884c11941aaf60025fce0844fd5120bf6d8d0eaace6244b66cc0a47ba1

See more details on using hashes here.

File details

Details for the file scrapy_poet-0.26.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_poet-0.26.0-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for scrapy_poet-0.26.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e5879711e527f8a983c95e60c9844a9a89df6a9a12d5c35bcd30ec2f57a98a4
MD5 6f8693262743a5d9bf7a136efd783443
BLAKE2b-256 248f28e3914962a2a27c33c2e250f47ecc4d546ed5cc4924a286fd551b98fc20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page