tool set for crawler project.
Project description
Welcome to crawlib Documentation
Crawl library provides crawler project building block to simplify:
url encoding.
html parse.
error handling.
download html and file.
request cache.
duplicate filter.
width first crawl strategy.
In addition, it is a web crawl framework for width first style crawling.
For example, suppose the target data is organized in a tree structure, for instance, State -> City -> Zipcode -> Street -> Address. Then crawlib is born for it.
Here is an Example Project for scraping data from https://crawlib.readthedocs.io/_static/state-list.html.
Quick Links
Install
crawlib is released on PyPI, so all you need is:
$ pip install crawlib
To upgrade to latest version:
$ pip install --upgrade crawlib
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file crawlib-0.0.26.tar.gz
.
File metadata
- Download URL: crawlib-0.0.26.tar.gz
- Upload date:
- Size: 92.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | baf7c4335b161801c665372a18f0ea260c103b2ad9054ac2e3e23a355a32d940 |
|
MD5 | 48a5056c71bc01ae8f195a5cc402eeb4 |
|
BLAKE2b-256 | 8bbdd2b651f386e59a1f031b0e580a30df4934737a9c0179144ff40705db65a8 |
File details
Details for the file crawlib-0.0.26-py2-none-any.whl
.
File metadata
- Download URL: crawlib-0.0.26-py2-none-any.whl
- Upload date:
- Size: 204.2 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30d176719376284f1cac2a57b658ffc6c48f678b05d85a8184c045dca09048b4 |
|
MD5 | 5991571af82ed409d223b32ae9d6b784 |
|
BLAKE2b-256 | b23450b3d70179f091230ff8861006ba0ebf3caad48f2ed45ad9e19022a9ef83 |