A decorator for writing coroutine-like spider callbacks.
Project description
Scrapy Inline Requests
A decorator for writing coroutine-like spider callbacks.
Free software: MIT license
Documentation: https://scrapy-inline-requests.readthedocs.org.
Python versions: 2.7, 3.4+
Quickstart
The spider below shows a simple use case of scraping a page and following a few links:
from inline_requests import inline_requests
from scrapy import Spider, Request
class MySpider(Spider):
name = 'myspider'
start_urls = ['http://httpbin.org/html']
@inline_requests
def parse(self, response):
urls = [response.url]
for i in range(10):
next_url = response.urljoin('?page=%d' % i)
try:
next_resp = yield Request(next_url)
urls.append(next_resp.url)
except Exception:
self.logger.info("Failed request %s", i, exc_info=True)
yield {'urls': urls}
See the examples/ directory for a more complex spider.
Known Issues
Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.
High concurrency and large responses can cause higher memory usage.
This decorator assumes your method have the following signature (self, response).
Wrapped requests may not be able to be serialized by persistent backends.
Unless you know what you are doing, the decorated method must be a spider method and return a generator instance.
History
0.3.1 (2016-07-04)
Added deprecation about decorating non-spider functions.
Warn if the callback returns requests with callback or errback set. This reverts the compability with requests with callbacks.
0.3.0 (2016-06-24)
~~Backward incompatible change: Added more restrictions to the request object (no callback/errback).~~
Cleanup callback/errback attributes before sending back the request to the generator. This fixes an edge case when using request.replace().
Simplified example spider.
0.2.0 (2016-06-23)
Python 3 support.
0.1.2 (2016-05-22)
Scrapy API and documentation updates.
0.1.1 (2013-02-03)
Minor tweaks and fixes.
0.1.0 (2012-02-03)
First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy-inline-requests-0.3.1.tar.gz
.
File metadata
- Download URL: scrapy-inline-requests-0.3.1.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06e884dee63d8293180ed622a3a8c00125248144f94213ac81277ebd84224a4d |
|
MD5 | f68704db4c6b16244f5a1cd3730eb8d5 |
|
BLAKE2b-256 | e55e47c1266b9be69f23249e808c97649052bf7e2aeed30e5874fad9762d6a4b |
File details
Details for the file scrapy_inline_requests-0.3.1-py2.py3-none-any.whl
.
File metadata
- Download URL: scrapy_inline_requests-0.3.1-py2.py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5b5443e37aba5c3d0acf739f3b02354f24e705256a713c5069b0dbafd685f2e |
|
MD5 | e08db935ce4e4cfecec426b2c480d427 |
|
BLAKE2b-256 | 49a7f5093677d9cdff3d6fffb1e2324e66c5c90719cde432027c8902004ed4cb |