A decorator for writing coroutine-like spider callbacks.
Project description
Scrapy Inline Requests
A decorator for writing coroutine-like spider callbacks.
Requires Scrapy>=1.0 and supports Python 2.7+ and 3.4+.
Free software: MIT license
Documentation: https://scrapy-inline-requests.readthedocs.org.
Usage
The spider below shows a simple use case of scraping a page and following a few links:
from scrapy import Spider, Request
from inline_requests import inline_requests
class MySpider(Spider):
name = 'myspider'
start_urls = ['http://httpbin.org/html']
@inline_requests
def parse(self, response):
urls = [response.url]
for i in range(10):
next_resp = yield Request(response.urljoin('?page=%d' % i))
urls.append(next_resp.url)
yield {'urls': urls}
See the examples/ directory for a more complex spider.
Known Issues
Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.
High concurrency and large responses can cause higher memory usage.
This decorator assumes your method have the following signature (self, response).
The decorated method must return a generator instance.
History
0.3.0 (2016-06-24)
Backward incompatible change: Added more restrictions to the request object (no callback/errback).
Cleanup callback/errback attributes before sending back the request to the generator.
Simplified example spider.
0.2.0 (2016-06-23)
Python 3 support.
0.1.2 (2016-05-22)
Scrapy API and documentation updates.
0.1.1 (2013-02-03)
Minor tweaks and fixes.
0.1.0 (2012-02-03)
First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy-inline-requests-0.3.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77ea3e973224368e39180e55c729d16e53b12b9960ae813cccc275919a03c5a4 |
|
MD5 | fd9e5db8d0a45f8f81cf51f077e69419 |
|
BLAKE2b-256 | cb309f568511b446f8bba790ac2f8a3a643f924da544d2e37fa3e63975580050 |
Hashes for scrapy_inline_requests-0.3.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d74e5d89697b205dcb7958df3ed5567302c4da6508d9f65f1cd62d613d9f341 |
|
MD5 | 7807004e623f6e42b8a9adec7f0526d8 |
|
BLAKE2b-256 | 8eba35fc41a996b2c5ec6e87575c7369b8798dd30577bb8e21f6b601b103ae9e |