Skip to main content

cabu is a simple REST microservice to scrap content from anywhere.

Project description

Cabu

Documentation Status

Cabu is a simple microservice framework to remotely crawl websites. It’s built on Flask and Selenium, contains a virtual display wrapper and few methods.

Full documentation here

Usage

@app.route('/gizmodo_last_articles_links')
def gizmodo_last_articles():
    app.webdriver.get('http://www.gizmodo.com')
    articles_links = [i.get_attribute('href') for i in app.webdriver.find_elements_by_css_selector('h1.headline>a')]

    return jsonify({'articles': articles_links})

Installing

$ pip install cabu

Features

  • Selenium configuration out of the box

  • Flask wrapping

  • Crawling methods included

  • AWS S3 Export

  • FTP / FTPS

  • Cookies persistence

  • Link extractor

  • Proxy configuration

  • Headless optional for local debug

  • Docker pre-configured distributed environment

  • Database handler

  • Compatible with most Flask extensions (Flask-Admin, Flask-Mail, Flask-OAuth, …)

  • 12 Factors compliance

(Likely to come soon)

  • CouchDB support

  • Couchbase support

  • Mobile drivers

  • SFTP

  • HtmlUnit web driver

  • Remote webdriver wrapper

  • Parallelization

  • Neural Network plugins

Testing

All tests were written using Docker services instead of Mocks. Alternative mocks will be added soon ;)

$ pip install -r requirements-dev.txt
$ py.test cabu/tests

Contributing

Please see the Contribute page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cabu-0.0.2.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

cabu-0.0.2-py2.py3-none-any.whl (8.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cabu-0.0.2.tar.gz.

File metadata

  • Download URL: cabu-0.0.2.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cabu-0.0.2.tar.gz
Algorithm Hash digest
SHA256 56cfb267fa81fe8abb0be1c21f64f839b6ae2c85256943458e722316164db519
MD5 f524930414d53f1902fba04f963b9bf8
BLAKE2b-256 53111a7f1fadf48c3713badee34d2a3a646d7436da37a109669e2a0ff4d15daf

See more details on using hashes here.

File details

Details for the file cabu-0.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for cabu-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ffbcaa80afcf7f4eb7c930e7682889853668dd2b34fd89b30d9cfe15bed6f41a
MD5 8bb0d61deef3948e5c688afbc35c2b81
BLAKE2b-256 a2601b6942169220ee8cac4983870e4ab550d31a50732137e0edf045446bd5ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page