A microframework for simple ETL solutions
# Bert A microframework for simple ETL solutions.
At its core, bert-etl uses Dynamodb Streams to communicate between lambda functions. bert-etl.yaml provides control on how the initial lambda function is called, either by periodic events, sns topics, or s3 bucket (planned)events. Passing an event to bert-etl is straight forward from zappa or a generic AWS lambda function you’ve hooked up to API Gateway.
At this moment in time, there are no plans to attach API Gateway to bert-etl.yaml because there is already great software(like zappa) that does this.
## Warning: aws-lambda deploy target still considered beta
bert-etl ships with a deploy target to aws-lambda. This feature isn’t very well documented yet, and has quite a bit of work to de done so it may function more consistently. Be aware that aws-lambda is a product ran and controlled by AWS. If you incure charges using bert-etl while utilizing aws-lambda, you may not consider us responsible. bert-etl is offered under MIT license which includes a Use at your own risk clause.
## Begin with
Lets begin with an example of loading data from a file-server and than loading it into numpy arrays
` $ virtualenv -p $(which python3) env $ source env/bin/activate $ pip install bert-etl $ pip install librosa # for demo project $ docker run -p 6379:6379 -d redis # bert-etl runs on redis to share data across CPUs $ bert-runner.py -n demo $ PYTHONPATH='.' bert-runner.py -m demo -j sync_sounds -f `
## Release Notes
- Added Error Management. When an error occurs, bert-runner will log the error and re-run the job. If the same error happens often enough, the job will be aborted
- Added Release Notes
- Added Redis Service auto run. Using docker, redis will be pulled and started in the background
- Added Redis Service channels, sometimes you’ll want to run to etl-jobs on the same machine
## Fund Bounty Target Upgrades
Bert provides a boiler plate framework that’ll allow one to write concurrent ETL code using Pythons’ microprocessing module. One function starts the process, piping data into a Redis backend that’ll then be consumed by the next function. The queues are respectfully named for the scope of the function: Work(start) and Done(end) queue. Please consider contributing to Bert Bounty Targets to improve this documentation
- Create configuration file, bert-etl.yaml
- Support conda venv
- Support pyenv venv
- Support dynamodb flush
- Support multipule invocations per AWS account
- Support undeploy AWS Lambda
- Support Bottle functions in AWS Lambda
## Tutorial Roadmap
- Introduce Bert API
- Explain bert.binding
- Explain comm_binder
- Explain work_queue
- Explain done_queue
- Explain ologger
- Explain DEBUG and how turning it off allows for x-concurrent processes
- Show an example on how to load timeseries data, calcualte the mean, and display the final output of the mean
- Expand the example to show how to scale the application implicitly
- Show how to run locally using Redis
- Show how to run locally without Redis, using Dynamodb instead
- Show how to run remotly using AWSLambda and Dynamodb
- Talk about dynamodb and eventual consistency
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size bert-etl-0.4.77.tar.gz (48.7 kB)||File type Source||Python version None||Upload date||Hashes View|