Skip to main content

Web Scraping Framework

Project description

## IOWeb Framework

![pytest status](https://github.com/lorien/ioweb/workflows/pytest/badge.svg) ![pytype status](https://github.com/lorien/ioweb/workflows/pytype/badge.svg)

Python framework to build web crawlers.

Good things:

  • system designed to run large number of network threads (like 100 or 500) on
    single CPU core
  • feature to combine things in chunks and then doing something with
    chunks (like mongodb bulk write)
  • asynchronous network operations are powered by gevent
  • network requests are handled with urllib3
  • HTML is parsed with lxml
  • ability to do CSS/XPATh queries to DOM tree of downloaded HTML document
  • ability to extract cert details
  • ability to resolve particular domain to custom IP
  • stat module to count events
  • logging statistics to influxdb
  • retrying on network errors

Bad things:

  • not fully covered with tests
  • no documentation

## Feedback

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ioweb, version 0.0.24
Filename, size File type Python version Upload date Hashes
Filename, size ioweb-0.0.24.tar.gz (26.7 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page