Skip to main content

tool set for crawler project.

Project description

https://travis-ci.org/MacHu-GWU/crawlib-project.svg?branch=master https://codecov.io/gh/MacHu-GWU/crawlib-project/branch/master/graph/badge.svg https://img.shields.io/pypi/v/crawlib.svg https://img.shields.io/pypi/l/crawlib.svg https://img.shields.io/pypi/pyversions/crawlib.svg https://img.shields.io/badge/Star_Me_on_GitHub!--None.svg?style=social

Welcome to crawlib Documentation

Crawl library provides crawler project building block to simplify:

  1. url encoding.

  2. html parse.

  3. error handling.

  4. download html and file.

  5. request cache.

  6. duplicate filter.

  7. width first crawl strategy.

In addition, it is a web crawl framework for width first style crawling.

For example, suppose the target data is organized in a tree structure, for instance, State -> City -> Zipcode -> Street -> Address. Then crawlib is born for it.

Here is an Example Project for scraping data from https://crawlib.readthedocs.io/_static/state-list.html.

Install

crawlib is released on PyPI, so all you need is:

$ pip install crawlib

To upgrade to latest version:

$ pip install --upgrade crawlib

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlib-0.0.26.tar.gz (92.7 kB view details)

Uploaded Source

Built Distribution

crawlib-0.0.26-py2-none-any.whl (204.2 kB view details)

Uploaded Python 2

File details

Details for the file crawlib-0.0.26.tar.gz.

File metadata

  • Download URL: crawlib-0.0.26.tar.gz
  • Upload date:
  • Size: 92.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.13

File hashes

Hashes for crawlib-0.0.26.tar.gz
Algorithm Hash digest
SHA256 baf7c4335b161801c665372a18f0ea260c103b2ad9054ac2e3e23a355a32d940
MD5 48a5056c71bc01ae8f195a5cc402eeb4
BLAKE2b-256 8bbdd2b651f386e59a1f031b0e580a30df4934737a9c0179144ff40705db65a8

See more details on using hashes here.

File details

Details for the file crawlib-0.0.26-py2-none-any.whl.

File metadata

  • Download URL: crawlib-0.0.26-py2-none-any.whl
  • Upload date:
  • Size: 204.2 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.13

File hashes

Hashes for crawlib-0.0.26-py2-none-any.whl
Algorithm Hash digest
SHA256 30d176719376284f1cac2a57b658ffc6c48f678b05d85a8184c045dca09048b4
MD5 5991571af82ed409d223b32ae9d6b784
BLAKE2b-256 b23450b3d70179f091230ff8861006ba0ebf3caad48f2ed45ad9e19022a9ef83

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page