scrapy-poet

Page Object pattern for Scrapy

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Framework
- Scrapy
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
- OS Independent
Programming Language

Project description

scrapy-poet implements Page Object pattern for Scrapy.

License is BSD 3-clause.

Installation

pip install scrapy-poet

scrapy-poet requires Python >= 3.6 and Scrapy 2.1.0+.

Usage

First, enable middleware in your settings.py:

DOWNLOADER_MIDDLEWARES = {
   'scrapy_poet.InjectionMiddleware': 543,
}

After that you can write spiders which use page object pattern to separate extraction code from a spider:

import scrapy
from web_poet.pages import WebPage


class BookPage(WebPage):
    def to_item(self):
        return {
            'url': self.url,
            'name': self.css("title::text").get(),
        }


class BooksSpider(scrapy.Spider):
    name = 'books'
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response):
        for url in response.css('.image_container a::attr(href)').getall():
            yield response.follow(url, self.parse_book)

    def parse_book(self, response, book_page: BookPage):
        yield book_page.to_item()

TODO: document motivation, the rest of the features, provide more usage examples, explain shortcuts, etc. For now, please check spiders in “example” folder: https://github.com/scrapinghub/scrapy-poet/tree/master/example/example/spiders

Contributing

Source code: https://github.com/scrapinghub/scrapy-poet
Issue tracker: https://github.com/scrapinghub/scrapy-poet/issues

Use tox to run tests with different Python versions:

tox

The command above also runs type checks; we use mypy.

Changes

0.0.2 (2020-04-28)

The repository is renamed to scrapy-poet, and split into two:

web-poet (https://github.com/scrapinghub/web-poet) contains definitions and code useful for writing Page Objects for web data extraction - it is not tied to Scrapy;
scrapy-poet (this package) provides Scrapy integration for such Page Objects.

API of the library changed in a backwards incompatible way; see README and examples.

New features:

DummyResponse annotation allows to skip downloading of scrapy Response.
callback_for works for Scrapy disk queues if it is used to create a spider method (but not in its inline form)
Page objects may require page objects as dependencies; dependencies are resolved recursively and built as needed.
InjectionMiddleware supports async def and asyncio providers.

0.0.1 (2019-08-28)

Initial release.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Framework
- Scrapy
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.22.3

Apr 25, 2024

0.22.2

Apr 24, 2024

0.22.1

Mar 7, 2024

0.22.0

Mar 4, 2024

0.21.0

Feb 8, 2024

0.20.1

Jan 24, 2024

0.20.0

Jan 15, 2024

0.19.0

Dec 26, 2023

0.18.0

Dec 12, 2023

0.17.0

Dec 11, 2023

0.16.1

Nov 2, 2023

0.16.0

Sep 26, 2023

0.15.1

Sep 15, 2023

0.15.0

Sep 12, 2023

0.14.0

Sep 8, 2023

0.13.0

May 8, 2023

0.12.0

Apr 26, 2023

0.11.0

Mar 17, 2023

0.10.1

Mar 3, 2023

0.10.0

Feb 24, 2023

0.9.0

Feb 17, 2023

0.8.0

Jan 24, 2023

0.7.0

Jan 17, 2023

0.6.0

Nov 24, 2022

0.5.1

Jul 28, 2022

0.5.0

Jul 28, 2022

0.4.0

Jun 20, 2022

0.3.0

Jan 28, 2022

0.2.1

Jun 11, 2021

0.2.0

Feb 22, 2021

0.1.0

Dec 29, 2020

0.0.3

Jul 18, 2020

This version

0.0.2

Apr 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-poet-0.0.2.tar.gz (51.4 kB view hashes)

Uploaded Apr 27, 2020 Source

Hashes for scrapy-poet-0.0.2.tar.gz

Hashes for scrapy-poet-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`b751e9a796a867a7a42ae2cfec685c6054cbc99cebdea0b0efe6313932a77f48`
MD5	`506e9c55ddecd5dc75171a093be538c0`
BLAKE2b-256	`a8dcde7640a90539b3c4b4a08912cbb3f147ff106f395abebcecbb1c3b99b692`