Crochet-based blocking API for Scrapy.
Project description
ScrapyDo
Crochet-based blocking API for Scrapy.
This module provides function helpers to run Scrapy in a blocking fashion. See the scrapydo-overview.ipynb notebook for a quick overview of this module.
Installation
Using pip:
pip install scrapydo
Using conda:
conda install -c https://conda.anaconda.org/rolando scrapydo
Usage
The function scrapydo.setup must be called once to initialize the reactor.
Example:
import scrapydo
scrapydo.setup()
# Fetch a single URL.
response = scrapydo.fetch("http://example.com")
# do stuff with response ...
# Crawl an URL with given callback.
def callback(response):
yield {
'title': response.css('title').extract(),
'url': response.url,
}
items = scrapydo.crawl('http://example.com', callback)
# do stuff with items ...
# Run an existing spider class.
items = scrapydo.runspider('http://example.com')
# do stuff with items ...
Available Functions
- scrapydo.setup()
Initialize reactor.
- scrapydo.fetch(url, spider_cls=DefaultSpider, capture_items=True, return_crawler=False, settings=None, timeout=DEFAULT_TIMEOUT)
Fetches an URL and returns the response.
- scrapydo.crawl(url, callback, spider_cls=DefaultSpider, capture_items=True, return_crawler=False, settings=None, timeout=DEFAULT_TIMEOUT)
Crawls an URL with given callback and returns the scraped items.
- scrapydo.run_spider(spider_cls, spider_cls, capture_items=True, return_crawler=False, settings=None, timeout=DEFAULT_TIMEOUT)
Runs a spider and returns the scraped items.
- highlight(code, lexer='html', formatter='html', output_wrapper=None)
Highlights given code using pygments. This function is suitable for use in a IPython notebook.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapydo-0.1.0.tar.gz
.
File metadata
- Download URL: scrapydo-0.1.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2b84a32d56bdb260b227ab304ef2add2847f00f9c4e0f470d6f4f96d0d30345 |
|
MD5 | 246a3ad9fdf91723c34e2009241c14d1 |
|
BLAKE2b-256 | b6dc7746d8b1677db3e025ee08b54e88247b90ef2564840a6d922284595e4720 |