Skip to main content

Scrapy decorator for inline requests

Project description

This module provides a decorator that allows to write coroutine-like spider callbacks.

The code is experimental, might not work in all cases and even might be hard to debug.

Example:

from inline_requests import inline_requests

class MySpider(CrawlSpider):

  ...

  @inline_requests
  def parse_item(self, response):
    item = self.build_item(response)

    # scrape more information
    response = yield Request(response.url + '?info')
    item['info'] = self.extract_info(response)

    # scrape pictures
    response = yield Request(response.url + '?pictures')
    item['pictures'] = self.extract_pictures(response)

    # a request that might fail (dns error, network timeout, error 404/500, etc)
    try:
      response = yield Request(response.url + '?protected')
    except Exception as e:
      log.err(e, spider=self)
    else:
      item['protected'] = self.extract_protected_info(response)

    # finally yield the item
    yield item

Example Project

The example directory includes a example spider for StackOverflow.com:

cd example
scrapy crawl stackoverflow

Requirements

  • Python 2.7+, 3.4+

  • Scrapy 1.0+

Known Issues

  • Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.

  • High concurrency and large responses can cause higher memory usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-inline-requests-0.2.0.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

scrapy_inline_requests-0.2.0-py2.py3-none-any.whl (4.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file scrapy-inline-requests-0.2.0.tar.gz.

File metadata

File hashes

Hashes for scrapy-inline-requests-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b8995ab28eab9aaa5324f0a11bc2197ad39e952413d038f2b4e4051a4855bd5a
MD5 70bc4c8061c5bbdd6dcc25b658cfbebe
BLAKE2b-256 8eb49691dde4dc2092d6211b128caf23069d6e36815ac839eed70ff28b46d151

See more details on using hashes here.

File details

Details for the file scrapy_inline_requests-0.2.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_inline_requests-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5a827bb1e8f287e80be29b003edb7957a07fb11278c44ea2e74c12f708b56fdd
MD5 1b0d9c624bc2cc9139d029f556af815c
BLAKE2b-256 404bfc449f58847a67f140d0c48c014138f863f5c4d36672a3cb7bdf3757d9b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page