Skip to main content

Page Object pattern for Scrapy

Project description

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Requires Python 3.8+ and Scrapy >= 2.6.0.

Usage in a Scrapy Project

Add the following inside Scrapy’s settings.py file:

DOWNLOADER_MIDDLEWARES = {
    "scrapy_poet.InjectionMiddleware": 543,
    "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
    "scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
    "scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt

  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black

  • isort

  • flake8

You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-poet-0.22.1.tar.gz (57.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_poet-0.22.1-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-poet-0.22.1.tar.gz.

File metadata

  • Download URL: scrapy-poet-0.22.1.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for scrapy-poet-0.22.1.tar.gz
Algorithm Hash digest
SHA256 25749627a9b25dfa86696fc4450113cd5778767ab9c584f4adbc6867e5b07912
MD5 1bc745e4f5407df3228a173c9290d49f
BLAKE2b-256 1196bb6a2cb4396305fade8f3128b2918420fe1abfc1841a2f20d4b01d53d04b

See more details on using hashes here.

File details

Details for the file scrapy_poet-0.22.1-py3-none-any.whl.

File metadata

  • Download URL: scrapy_poet-0.22.1-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for scrapy_poet-0.22.1-py3-none-any.whl
Algorithm Hash digest
SHA256 95f195bc4cc8a862f282323ad07e6d72531470945dd36a058e4883626b5fd4b5
MD5 34db43b2e59b4ce914a95f2eaaa0ac2b
BLAKE2b-256 a9687d270773a0044591b5557c8d77b8306dda8618f57dcf487f404381ae5f05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page