Skip to main content

A microframework for simple ETL solutions

Project description

[![Documentation Status](https://readthedocs.org/projects/bert-etl/badge/?version=latest)](https://bert-etl.readthedocs.io/en/latest/?badge=latest)

# Bert A microframework for simple ETL solutions.

## Architecture

At its core, bert-etl uses Dynamodb Streams to communicate between lambda functions. bert-etl.yaml provides control on how the initial lambda function is called, either by periodic events, sns topics, or s3 bucket (planned)events. Passing an event to bert-etl is straight forward from zappa or a generic AWS lambda function you’ve hooked up to API Gateway.

At this moment in time, there are no plans to attach API Gateway to bert-etl.yaml because there is already great software(like zappa) that does this.

## Warning: aws-lambda deploy target still considered beta

bert-etl ships with a deploy target to aws-lambda. This feature isn’t very well documented yet, and has quite a bit of work to de done so it may function more consistently. Be aware that aws-lambda is a product ran and controlled by AWS. If you incure charges using bert-etl while utilizing aws-lambda, you may not consider us responsible. bert-etl is offered under MIT license which includes a Use at your own risk clause.

## Begin with

Lets begin with an example of loading data from a file-server and than loading it into numpy arrays

` $ virtualenv -p $(which python3) env $ source env/bin/activate $ pip install bert-etl $ pip install librosa # for demo project $ docker run -p 6379:6379 -d redis # bert-etl runs on redis to share data across CPUs $ bert-runner.py -n demo $ PYTHONPATH='.' bert-runner.py -m demo -j sync_sounds -f `

## Release Notes

### 0.3.0

  • Added Error Management. When an error occurs, bert-runner will log the error and re-run the job. If the same error happens often enough, the job will be aborted

### 0.2.1

  • Added Release Notes

### 0.2.0

  • Added Redis Service auto run. Using docker, redis will be pulled and started in the background

  • Added Redis Service channels, sometimes you’ll want to run to etl-jobs on the same machine

## Fund Bounty Target Upgrades

Bert provides a boiler plate framework that’ll allow one to write concurrent ETL code using Pythons’ microprocessing module. One function starts the process, piping data into a Redis backend that’ll then be consumed by the next function. The queues are respectfully named for the scope of the function: Work(start) and Done(end) queue. Please consider contributing to Bert Bounty Targets to improve this documentation

https://www.patreon.com/jbcurtin

## Roadmap

  • Create configuration file, bert-etl.yaml

  • Support conda venv

  • Support pyenv venv

  • Support dynamodb flush

  • Support multipule invocations per AWS account

  • Support undeploy AWS Lambda

  • Support Bottle functions in AWS Lambda

## Tutorial Roadmap

  • Introduce Bert API

  • Explain bert.binding

  • Explain comm_binder

  • Explain work_queue

  • Explain done_queue

  • Explain ologger

  • Explain DEBUG and how turning it off allows for x-concurrent processes

  • Show an example on how to load timeseries data, calcualte the mean, and display the final output of the mean

  • Expand the example to show how to scale the application implicitly

  • Show how to run locally using Redis

  • Show how to run locally without Redis, using Dynamodb instead

  • Show how to run remotly using AWSLambda and Dynamodb

  • Talk about dynamodb and eventual consistency

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bert-etl-0.4.67.tar.gz (35.5 kB view details)

Uploaded Source

Built Distribution

bert_etl-0.4.67-py2.py3-none-any.whl (46.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file bert-etl-0.4.67.tar.gz.

File metadata

  • Download URL: bert-etl-0.4.67.tar.gz
  • Upload date:
  • Size: 35.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.8

File hashes

Hashes for bert-etl-0.4.67.tar.gz
Algorithm Hash digest
SHA256 bc209c038eb8d789cd62e19e0e98c4213e5b2e852e644fa549ebf0a20339b2ff
MD5 f11bf4bf93ff0ef00b18653f8e96ccdf
BLAKE2b-256 ed8a29f1e315d5fb5fccd9572ae20b7d9c642717b407416f98d090309ac9fa0d

See more details on using hashes here.

File details

Details for the file bert_etl-0.4.67-py2.py3-none-any.whl.

File metadata

  • Download URL: bert_etl-0.4.67-py2.py3-none-any.whl
  • Upload date:
  • Size: 46.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.8

File hashes

Hashes for bert_etl-0.4.67-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 35e45e36eb5ba8e71f746a2410e10e7ec3c361a99bf2835615b6620b4c522520
MD5 6045c63a25f7fe432e2f3dabaee3f02f
BLAKE2b-256 f09e21f8026bcd4ad73b90030fe406e459dfb6970e99c9d50eb9da7ff670cfdd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page