Skip to main content

An all-in-one Web Crawler, Web Parser and Web Scrapping library!

Project description

## Webb - A Complete Web Scrapper and Crawler Library An all-in-one Python library to scrap, parse and crawl web pages

### Gist This is a light-weight, dynamic and highly-flexible Python library. It can be used to crawl, download, index, parse, scrap and analyze web pages in a systematic manner or any of the individual functionality. It is also used to clean web pages, normalize web pages, store web data, extract server-side information and import/export relevant components from the web. Some of the other features also include downloading images from a web page, downloading google images and spidering wikipedia articles.

### Usage and Instructions For usage and instructions please visit the [Official Documentation](https://github.com/hardikvasa/webb/blob/master/docs/Documentation.md)

For issues and discussion visit the [Issue Tracker](https://github.com/hardikvasa/webb/issues)

For sample codes and examples, please visit [Examples Codes](https://github.com/hardikvasa/webb/tree/master/examples)

### Compatability This library is compatible with both Python 2 (2.x) as well as Python 3 (3.x) versions. It is a download-import-and-run program with no or little changes as required by users.

### Dependencies There are no dependencies to this project. Hurray! It functions entirely of the standard ‘built-in’ library support. It does not need any external support or installations. Just download and run!!!

### Status This is a stand-alone python script which is ready-to-run, but still under development. Many more features will be added to it shortly.

### Disclaimer The crawler function lets you download and crawl tons of web pages. Please do not download and crawl any pages of a domain without reading the ‘robot.txt’ file of that specific domain.

It is inappropriate to violate the robot.txt file and is strictly not recommended. This may even lead to the domain completely blocking your crawler and thus blacklisting it. It is also not appropriate to crawl pages at high rate as it may put a lot of pressure on the requesting server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webb-0.9.2.5.zip (14.4 kB view details)

Uploaded Source

File details

Details for the file webb-0.9.2.5.zip.

File metadata

  • Download URL: webb-0.9.2.5.zip
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webb-0.9.2.5.zip
Algorithm Hash digest
SHA256 5b4b64abc1ba0843618ff6cb7b16777cfb307743abba50483f876e49243bf780
MD5 d4708bbf318c86ca70a46f09a878d306
BLAKE2b-256 6d1b207dc4f94fe3ad68122dd67f107c9954bc12888663bda2041c29b9164176

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page