Skip to main content

A decorator for writing coroutine-like spider callbacks.

Project description

Scrapy Inline Requests

https://img.shields.io/pypi/v/scrapy-inline-requests.svg https://img.shields.io/pypi/pyversions/scrapy-inline-requests.svg Documentation Status https://img.shields.io/travis/rolando/scrapy-inline-requests.svg Coverage Status Code Quality Status Requirements Status

A decorator for writing coroutine-like spider callbacks.

Quickstart

The spider below shows a simple use case of scraping a page and following a few links:

from inline_requests import inline_requests
from scrapy import Spider, Request

class MySpider(Spider):
    name = 'myspider'
    start_urls = ['http://httpbin.org/html']

    @inline_requests
    def parse(self, response):
        urls = [response.url]
        for i in range(10):
            next_url = response.urljoin('?page=%d' % i)
            try:
                next_resp = yield Request(next_url)
                urls.append(next_resp.url)
            except Exception:
                self.logger.info("Failed request %s", i, exc_info=True)

        yield {'urls': urls}

See the examples/ directory for a more complex spider.

Known Issues

  • Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.

  • High concurrency and large responses can cause higher memory usage.

  • This decorator assumes your method have the following signature (self, response).

  • Wrapped requests may not be able to be serialized by persistent backends.

  • Unless you know what you are doing, the decorated method must be a spider method and return a generator instance.

History

0.3.1 (2016-07-04)

  • Added deprecation about decorating non-spider functions.

  • Warn if the callback returns requests with callback or errback set. This reverts the compability with requests with callbacks.

0.3.0 (2016-06-24)

  • ~~Backward incompatible change: Added more restrictions to the request object (no callback/errback).~~

  • Cleanup callback/errback attributes before sending back the request to the generator. This fixes an edge case when using request.replace().

  • Simplified example spider.

0.2.0 (2016-06-23)

  • Python 3 support.

0.1.2 (2016-05-22)

  • Scrapy API and documentation updates.

0.1.1 (2013-02-03)

  • Minor tweaks and fixes.

0.1.0 (2012-02-03)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-inline-requests-0.3.1.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

scrapy_inline_requests-0.3.1-py2.py3-none-any.whl (8.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapy-inline-requests-0.3.1.tar.gz.

File metadata

File hashes

Hashes for scrapy-inline-requests-0.3.1.tar.gz
Algorithm Hash digest
SHA256 06e884dee63d8293180ed622a3a8c00125248144f94213ac81277ebd84224a4d
MD5 f68704db4c6b16244f5a1cd3730eb8d5
BLAKE2b-256 e55e47c1266b9be69f23249e808c97649052bf7e2aeed30e5874fad9762d6a4b

See more details on using hashes here.

File details

Details for the file scrapy_inline_requests-0.3.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_inline_requests-0.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d5b5443e37aba5c3d0acf739f3b02354f24e705256a713c5069b0dbafd685f2e
MD5 e08db935ce4e4cfecec426b2c480d427
BLAKE2b-256 49a7f5093677d9cdff3d6fffb1e2324e66c5c90719cde432027c8902004ed4cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page