ScrapyCloud HubStorage frontier backend for Frontera
Project description
# HCF (HubStorage Crawl Frontier) Backend for Frontera
When used with scrapy, use it with Scrapy Scheduler provided by [scrapy-frontera](https://github.com/scrapinghub/scrapy-frontera). Scrapy scheduler provided by [Frontera](https://github.com/scrapinghub/frontera) is not supported. scrapy-frontera is a scrapy scheduler which allows to use frontera backends, like the present one, with scrapy projects.
See specific usage instructions at module and class docstrings at [backend.py](https://github.com/scrapinghub/hcf-backend/blob/master/hcf_backend/backend.py). Some examples of usage can be seen in the [scrapy-frontera README](https://github.com/scrapinghub/scrapy-frontera/blob/master/README.rst).
A complete tutorial for using hcf-backend with ScrapyCloud workflows is available at [shub-workflow Tutorial: Managing Hubstorage Crawl Frontiers](https://github.com/scrapinghub/shub-workflow/wiki/Managing-Hubstorage-Crawl-Frontiers). shub-workflow is a framework for defining workflows of spiders and scripts running over ScrapyCloud. This is a strongly recommended lecture, because it documents the integration of different tools which together provide the best benefit.
Package also provides a convenient command line tool for hubstorage frontier handling and manipulation: [hcfpal.py](https://github.com/scrapinghub/hcf-backend/blob/master/hcf_backend/utils/hcfpal.py). It supports dumping, count, deletion, moving, listing, etc. See command line help for usage.
Another provided tool is [crawlmanager.py](https://github.com/scrapinghub/hcf-backend/blob/master/hcf_backend/utils/crawlmanager.py). It facilitates the scheduling of consumer spider jobs. Examples of usage are also available in the already mentioned shub-workflow Tutorial.
Installation
pip install hcf-backend
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hcf_backend-0.4.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da6f83e5415cf2a03093b00d7219b91efba4aa81344acdcf7f6d3c929e200c99 |
|
MD5 | d0bf7fe10cdef5e855bddeaeb4b19635 |
|
BLAKE2b-256 | 158973aecbe3d7f97210e9904ab9617462171f60cccf7e2148b80343536b0dea |