Crochet-based blocking API for Scrapy.
Project description
ScrapyDo
Crochet-based blocking API for Scrapy.
This module provides function helpers to run Scrapy in a blocking fashion. See the scrapydo-overview.ipynb notebook for a quick overview of this module.
Installation
Using pip:
pip install scrapydo
Usage
The function scrapydo.setup must be called once to initialize the reactor.
Example:
import scrapydo
scrapydo.setup()
scrapydo.default_settings.update({
'LOG_LEVEL': 'DEBUG',
'CLOSESPIDER_PAGECOUNT': 10,
})
# Enable logging display
import logging
logging.basicConfig(level=logging.DEBUG)
# Fetch a single URL.
response = scrapydo.fetch("http://example.com")
# Crawl an URL with given callback.
def parse_page(response):
yield {
'title': response.css('title').extract(),
'url': response.url,
}
for href in response.css('a::attr(href)'):
url = response.urljoin(href)
yield Request(url, callback=parse_page)
items = scrapydo.crawl('http://example.com', callback)
# Run an existing spider class.
spider_args = {'foo': 'bar'}
items = scrapydo.run_spider(MySpider, **spider_args)
Available Functions
- scrapydo.setup()
Initialize reactor.
- scrapydo.fetch(url, spider_cls=DefaultSpider, capture_items=True, return_crawler=False, settings=None, timeout=DEFAULT_TIMEOUT)
Fetches an URL and returns the response.
- scrapydo.crawl(url, callback, spider_cls=DefaultSpider, capture_items=True, return_crawler=False, settings=None, timeout=DEFAULT_TIMEOUT)
Crawls an URL with given callback and returns the scraped items.
- scrapydo.run_spider(spider_cls, capture_items=True, return_crawler=False, settings=None, timeout=DEFAULT_TIMEOUT, **kwargs)
Runs a spider and returns the scraped items.
- highlight(code, lexer='html', formatter='html', output_wrapper=None)
Highlights given code using pygments. This function is suitable for use in a IPython notebook.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapydo-0.2.2.tar.gz
(4.9 kB
view details)
File details
Details for the file scrapydo-0.2.2.tar.gz
.
File metadata
- Download URL: scrapydo-0.2.2.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 376ce6fd133dfcefcc1ac5e2d61e768ea38866467c3ba36caa73b0e5b66876b9 |
|
MD5 | 02850c8f449b5b7f51816dad1db2d7e4 |
|
BLAKE2b-256 | 0e7b99c06256a12e7c3e084cd559fee33a76599778e0fa04c3f10ae48d1afcbc |