Skip to main content

cabu is a simple REST microservice to scrap content from anywhere.

Project description


Documentation Status

Cabu is a simple microservice framework to remotely crawl websites. It’s built on Flask and Selenium, contains a virtual display wrapper and few methods.

Full documentation here


def gizmodo_last_articles():
    articles_links = [i.get_attribute('href') for i in app.webdriver.find_elements_by_css_selector('h1.headline>a')]

    return jsonify({'articles': articles_links})


$ pip install cabu


  • Selenium configuration out of the box
  • Flask wrapping
  • Crawling methods included
  • AWS S3 Export
  • FTP / FTPS
  • Cookies persistence
  • Link extractor
  • Proxy configuration
  • Headless optional for local debug
  • Docker pre-configured distributed environment
  • Database handler
  • Compatible with most Flask extensions (Flask-Admin, Flask-Mail, Flask-OAuth, …)
  • 12 Factors compliance

(Likely to come soon)

  • CouchDB support
  • Couchbase support
  • Mobile drivers
  • SFTP
  • HtmlUnit web driver
  • Remote webdriver wrapper
  • Parallelization
  • Neural Network plugins


All tests were written using Docker services instead of Mocks. Alternative mocks will be added soon ;)

$ pip install -r requirements-dev.txt
$ py.test cabu/tests


Please see the Contribute page.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
cabu-0.0.2-py2.py3-none-any.whl (8.1 kB) Copy SHA256 hash SHA256 Wheel 2.7
cabu-0.0.2.tar.gz (6.1 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page