Skip to main content

CLI tool for scaffolding analytics engineering data stacks.

Project description

RAE - Rapid Analytics Engineering

Perfectly imperfect, for she was a Wildflower

RAE is the first opinionated framework that is purpose-built for the Analytics Engineering community — inspired by the likes of backend web frameworks like Django, Flask and NestJS, but with a Data Engineering twist!

RAE:

  • Empowers teams, solo devs, students and individual engineers to rapidly scaffold a modern analytics engineering stack with nothing more than a few responses to CLI prompts. From zero to fully containerized infrastructure in minutes.
    • Users can also opt to only scaffold 1 or 2 tools versus an entire stack
  • Abstracts away the infrastructure, container and server knowledge required to set most tools up.

All so you can focus on what matters most: modeling, orchestrating, and delivering data.


What RAE Does

Scaffold Tool Docker Configurations

Spin up a project with plug-and-play support for essential data tools:

  • Data Storage:
    • PostgreSQL
    • MySQL
  • Data Modeling:
    • dbt
    • SQL Mesh
  • Orchestration:
    • Airflow
    • Dagster

Auto-Generate settings.py

A clean and extensible settings file inspired by Django — making it easy to pass environment-specific values (ports, credentials, container names, etc.) to every component of your stack.

Auto-Generate docker-compose.yml

  • Connect all services via a shared Docker network.

Frameworks aren't just for web and mobile engineers anymore. RAE gives Analytics Engineers the tools to build, connect and orchestrate their data stack with ease.

Build like a developer. Deploy like an engineer. Let RAE compose your analytics stack.


Who Is RAE For?

  • Analytics Engineers who want to quickly scaffold their required infrastructure.
  • Data Engineers who need to tie various tools together.
  • Data Scientists who have a need for a data tool stack.
  • Individual developers and anyone learning to use analytics/data engineering tools.
  • Teams that want standardization and clarity across their data stack.

How to use RAE

System Dependencies

Tool Required Version Notes
Python 3.8+ Required for the RAE CLI tool
Docker Desktop Latest Docker Desktop (macOS/Windows) or Docker Engine (Linux)
Shell bash / zsh / PowerShell Used to run CLI and Docker commands
Web Browser Any modern browser Google Chrome recommended for container-based UIs (e.g. Airflow)

CLI Setup Steps

1. Create a Virtual Environment

OS Command
macOS/Linux python3 -m venv local-env
Windows py -m venv local-env

2. Activate the Virtual Environment

OS Command
macOS/Linux (bash/zsh) source local-env/bin/activate
Windows (CMD) local-env\Scripts\activate.bat
Windows (PowerShell) local-env\Scripts\Activate.ps1

3. Install RAE CLI:

pip install rae-cli

4. Initialize your project:

rae init

This will take you through a series of prompts and then generate the project_config.json and settings.py files.

After this command completes you will be left with the following project structure:

├── rae
│   └── src
│       ├── airflow
│          └── airflow-init.sh
│       ├── dbt
│          ├── analyses
│          ├── macros
│          ├── models
│          ├── seeds
│          ├── snapshots
│          ├── tests
│          ├── dbt-init.sh
│          ├── dbt.sh
│          ├── dbt_project.yml
│          └── Dockerfile
│       ├── docker-compose.yml
│       ├── postgres
│          └── postgres-init.sh
│       └── settings
│           ├── project_config.json
│           └── settings.py

The above is just an example and assumes you selected postgres as your data storage, dbt as your data modeling and airflow as your orchestration with postgres as the metastore.

5. Open your settings file - {project_name}/src/settings/settings.py

- You need to populate this file with your specific credentials
    - `data_storage` (PostgreSQL or MySQL)
    - `data_modeling` (dbt or SQL Mesh)
    - `data_orchestration` (Airflow or Dagster)
- If you do not do this, the project will be usable, but the project's containers will be built with default values and will NOT BE production ready nor secure.

You are responsible for ensuring your project is secure, is setup properly and is ready for deployment!

6. Generate your docker compose file:

  1. cd into your project directory:
cd {project_name}
  1. generate your compose file:
rae generate-compose-file
  1. generate docker-compose file without changing directories:
rae generate-compose-file --project-name {project_name}

7. Run your project's Docker containers:

Docker must be installed AND running on your host machine or this command will fail So make sure you have Docker Desktop installed and actively running on your machine!

cd {project_name}/src

Then simply start the containers:

docker-compose up -d

This will run the docker containers for each service and link them via a docker network. The process allows for each container to communicate with one another while still ensuring all tools operate in an isolated state.


Current State of Project

Future Implementations

1. Add secondary test coverage to the project:
  - src/cli.py
  - src/data_modeling/dbt_modeling
  - src/data_modeling/sql_mesh_modeling
  - src/data_orchestration/airflow_orchestration.py
  - src/data_orchestration/dagster_orchestration.py
  - src/data_storage/mysql_storage.py
  - src/data_storage/postgresql_storage.py
  - src/generators/docker_compose_generator.py
  
2. Continue iterating on test coverage
  - src/managers/data_modeling_manager.py
  - src/managers/data_orchestration_manager.py
  - src/managers/data_storage_manager.py
  - src/managers/settings_manager.py
  - src/utility/base_manager.py
  - src/utility/base_tool.py
  - src/utility/dockerfile_writer.py
  - src/utility/indented_dumper.py
  - src/utility/shell_script_writer.py
  - src/utility/supported_tools.py
  - src/main.py

3. Add support for additional data storage tools:
  - Snowflake
  - DuckDB?
  - SQL Server?
  - Databricks
    - AWS S3
    - Google Cloud Storage
    - Azure Blob Storage

4. Add support to allow users to scaffold single applications or custom combinations of tool stacks
  - Scenarios:
    - user only needs a data modeling tool
    - user only needs a data modeling tool and a data storage tool
    - user only needs an orchestration tool
    - etc
  - Intent:
    - To allow greater flexibility and provide a wider use-case for the CLI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rae_cli-0.2.10.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rae_cli-0.2.10-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file rae_cli-0.2.10.tar.gz.

File metadata

  • Download URL: rae_cli-0.2.10.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for rae_cli-0.2.10.tar.gz
Algorithm Hash digest
SHA256 c6bd5c5bae6fdb859a1ea0f7304287bce8db15637948d826de53e346a3ab8266
MD5 8efcf58ef9e1d09e945df26d10775a29
BLAKE2b-256 7dbc16b39e71892f9605f554fe4bdde6218c7e505f06495f01c333b867e3c3b6

See more details on using hashes here.

File details

Details for the file rae_cli-0.2.10-py3-none-any.whl.

File metadata

  • Download URL: rae_cli-0.2.10-py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for rae_cli-0.2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 ca3e20a0f6d1fa7f90c30137fa745ea7258485fb43a4a1fc6de77fc87a2b63b4
MD5 c0ffd64bbd37eff9fb3b72e3d1878fa6
BLAKE2b-256 c773361dae7608c1852e48cfc897860489f9628d0187cf8a81646bc05da6cad5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page