A Data Engineering opinionated framework
Project description
Rony - Data Engineering made simple
An opinionated Data Engineering framework
Developed with ❤️ by A3Data
What is Rony
Rony is an open source framework that helps Data Engineers setting up more organized code and build, test and deploy data pipelines faster.
Why Rony?
Rony is Hermione's best friend (or so...). This was a perfect choice for naming the second framework released by A3Data, this one focusing on Data Engineering.
In many years on helping companies building their data analytics projects and cloud infrastructure, we acquired a knowledge basis that led to a collection of code snippets and automation procedures that speed things up when it comes to developing data structure and data pipelines.
Some choices we made
Rony relies on top of a few decisions that make sense for the majority of projects conducted by A3Data:
You are free to change this decisions as you wish (that's the whole point of the framework - flexibility).
Installing
Dependencies
- Python (>=3.6)
Install
pip install -U rony
How do I use Rony?
After installing Rony you can test if the installation is ok by running:
rony info
and you shall see a cute logo. Then,
- Create a new project:
rony new project_rony
- Rony already creates a virtual environment for the project. Windows users can activate it with
<project_name>_env\Scripts\activate
Linux and MacOS users can do
source <project_name>_env/bin/activate
- After activating, you should install some libraries. There are a few suggestions in “requirements.txt” file:
pip install -r requirements.txt
- Rony has also some handy cli commands to build and run docker images locally. You can do
cd etl
rony build <image_name>:<tag>
to build an image and run it with
rony run <image_name>:<tag>
In this particular implementation, run.py
has a simple etl code that accepts a parameter to filter the data based on the Sex
column. To use that, you can do
docker run <image_name>:<tag> -s female
Implementation suggestions
When you start a new rony
project, you will find
-
an
infrastructure
folder with terraform code creating on AWS:- an S3 bucket
- a Lambda function
- a CloudWatch log group
- a ECR repository
- a AWS Glue Crawler
- IAM roles and policies for lambda and glue
-
an
etl
folder with:- a
Dockerfile
and arun.py
example of ETL code - a
lambda_function.py
with a "Hello World" example
- a
-
a
tests
folder with unit testing on the Lambda function -
a
.github/workflow
folder with a Github Actions CI/CD pipeline suggestion. This pipeline- Tests lambda function
- Builds and runs the docker image
- Sets AWS credentials
- Make a terraform plan (but not actually deploy anything)
-
a
dags
folder with some Airflow example code.f
You also have a scripts
folder with a bash file that builds a lambda deploy package.
Feel free to adjust and adapt everything according to your needs.
Contributing
Make a pull request with your implementation.
For suggestions, contact us: rony@a3data.com.br
Licence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.