tool set for crawler project.
Project description
Welcome to crawlib Documentation
Crawl library provides crawler project building block to simplify:
url encoding.
html parse.
error handling.
download html and file.
request cache.
duplicate filter.
width first crawl strategy.
In addition, it is a web crawl framework for width first style crawling.
For example, suppose the target data is organized in a tree structure, for instance, State -> City -> Zipcode -> Street -> Address. Then crawlib is born for it.
Here is an Example Project for scraping data from https://crawlib.readthedocs.io/_static/state-list.html.
Quick Links
Install
crawlib is released on PyPI, so all you need is:
$ pip install crawlib
To upgrade to latest version:
$ pip install --upgrade crawlib
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.