cabu is a simple REST microservice to scrap content from anywhere.
Project description
Cabu
Cabu is a simple microservice framework to remotely crawl websites. It’s built on Flask and Selenium, contains a virtual display wrapper and few methods.
Usage
@app.route('/gizmodo_last_articles_links')
def gizmodo_last_articles():
app.webdriver.get('http://www.gizmodo.com')
articles_links = [i.get_attribute('href') for i in app.webdriver.find_elements_by_css_selector('h1.headline>a')]
return jsonify({'articles': articles_links})
Installing
$ pip install cabu
Features
Selenium configuration out of the box
Flask wrapping
Crawling methods included
AWS S3 Export
FTP / FTPS
Cookies persistence
Link extractor
Proxy configuration
Headless optional for local debug
Docker pre-configured distributed environment
Database handler
Compatible with most Flask extensions (Flask-Admin, Flask-Mail, Flask-OAuth, …)
12 Factors compliance
(Likely to come soon)
CouchDB support
Couchbase support
Mobile drivers
SFTP
HtmlUnit web driver
Remote webdriver wrapper
Parallelization
Neural Network plugins
Testing
All tests were written using Docker services instead of Mocks. Alternative mocks will be added soon ;)
$ pip install -r requirements-dev.txt
$ py.test cabu/tests
Contributing
Please see the Contribute page.
Copyright
Cabu is an open source project by Théotime Lévèque.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cabu-0.0.2.tar.gz
.
File metadata
- Download URL: cabu-0.0.2.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56cfb267fa81fe8abb0be1c21f64f839b6ae2c85256943458e722316164db519 |
|
MD5 | f524930414d53f1902fba04f963b9bf8 |
|
BLAKE2b-256 | 53111a7f1fadf48c3713badee34d2a3a646d7436da37a109669e2a0ff4d15daf |
File details
Details for the file cabu-0.0.2-py2.py3-none-any.whl
.
File metadata
- Download URL: cabu-0.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffbcaa80afcf7f4eb7c930e7682889853668dd2b34fd89b30d9cfe15bed6f41a |
|
MD5 | 8bb0d61deef3948e5c688afbc35c2b81 |
|
BLAKE2b-256 | a2601b6942169220ee8cac4983870e4ab550d31a50732137e0edf045446bd5ee |