This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

cabu is a simple REST microservice to scrap content from anywhere.

Project Description

Cabu

Cabu is a simple microservice framework to remotely crawl websites. It’s built on Flask and Selenium, contains a virtual display wrapper and few methods.

Full documentation here

Usage

@app.route('/gizmodo_last_articles_links')
def gizmodo_last_articles():
    app.webdriver.get('http://www.gizmodo.com')
    articles_links = [i.get_attribute('href') for i in app.webdriver.find_elements_by_css_selector('h1.headline>a')]

    return jsonify({'articles': articles_links})

Installing

$ pip install cabu

Features

  • Selenium configuration out of the box
  • Flask wrapping
  • Crawling methods included
  • AWS S3 Export
  • FTP / FTPS
  • Cookies persistence
  • Link extractor
  • Proxy configuration
  • Headless optional for local debug
  • Docker pre-configured distributed environment
  • Database handler
  • Compatible with most Flask extensions (Flask-Admin, Flask-Mail, Flask-OAuth, …)
  • 12 Factors compliance

(Likely to come soon)

  • CouchDB support
  • Couchbase support
  • Mobile drivers
  • SFTP
  • HtmlUnit web driver
  • Remote webdriver wrapper
  • Parallelization
  • Neural Network plugins

Testing

All tests were written using Docker services instead of Mocks. Alternative mocks will be added soon ;)

$ pip install -r requirements-dev.txt
$ py.test cabu/tests

Contributing

Please see the Contribute page.

Release History

Release History

This version
History Node

0.0.2

History Node

0.0.1

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
cabu-0.0.2-py2.py3-none-any.whl (8.1 kB) Copy SHA256 Checksum SHA256 2.7 Wheel Feb 16, 2016
cabu-0.0.2.tar.gz (6.1 kB) Copy SHA256 Checksum SHA256 Source Feb 16, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting