Skip to main content

CLI tool for scaffolding analytics engineering data stacks.

Project description

RAE - Rapid Analytics Engineering

Perfectly imperfect, for she was a Wildflower

RAE is the first opinionated framework that is purpose-built for the Analytics Engineering community — inspired by the likes of backend web frameworks like Django, Flask and NestJS, but with a Data Engineering twist!

RAE:

  • Empowers teams, solo devs, students and individual engineers to rapidly scaffold a modern analytics engineering stack with nothing more than a few responses to CLI prompts. From zero to fully containerized infrastructure in minutes.
    • Users can also opt to only scaffold 1 or 2 tools versus an entire stack
  • Abstracts away the infrastructure, container and server knowledge required to set most tools up.

All so you can focus on what matters most: modeling, orchestrating, and delivering data.


What RAE Does

Scaffold Tool Docker Configurations

Spin up a project with plug-and-play support for essential data tools:

  • Data Storage:
    • PostgreSQL
    • MySQL
  • Data Modeling:
    • dbt
    • SQL Mesh
  • Orchestration:
    • Airflow
    • Dagster

Auto-Generate settings.py

A clean and extensible settings file inspired by Django — making it easy to pass environment-specific values (ports, credentials, container names, etc.) to every component of your stack.

Auto-Generate docker-compose.yml

  • Connect all services via a shared Docker network.

Frameworks aren't just for web and mobile engineers anymore. RAE gives Analytics Engineers the tools to build, connect and orchestrate their data stack with ease.

Build like a developer. Deploy like an engineer. Let RAE compose your analytics stack.


Who Is RAE For?

  • Analytics Engineers who want to quickly scaffold their required infrastructure.
  • Data Engineers who need to tie various tools together.
  • Data Scientists who have a need for a data tool stack.
  • Individual developers and anyone learning to use analytics/data engineering tools.
  • Teams that want standardization and clarity across their data stack.

How to use RAE

System Dependencies

Tool Required Version Notes
Python 3.8+ Required for the RAE CLI tool
Docker Desktop Latest Docker Desktop (macOS/Windows) or Docker Engine (Linux)
Shell bash / zsh / PowerShell Used to run CLI and Docker commands
Web Browser Any modern browser Google Chrome recommended for container-based UIs (e.g. Airflow)

CLI Setup Steps

1. Create a Virtual Environment

OS Command
macOS/Linux python3 -m venv local-env
Windows py -m venv local-env

2. Activate the Virtual Environment

OS Command
macOS/Linux (bash/zsh) source local-env/bin/activate
Windows (CMD) local-env\Scripts\activate.bat
Windows (PowerShell) local-env\Scripts\Activate.ps1

3. Install RAE CLI:

pip install rae-cli

4. Initialize your project:

rae init

This will take you through a series of prompts and then generate the project_config.json and settings.py files.

After this command completes you will be left with the following project structure:

├── rae
│   └── src
│       ├── airflow
│          └── airflow-init.sh
│       ├── dbt
│          ├── analyses
│          ├── macros
│          ├── models
│          ├── seeds
│          ├── snapshots
│          ├── tests
│          ├── dbt-init.sh
│          ├── dbt.sh
│          ├── dbt_project.yml
│          └── Dockerfile
│       ├── docker-compose.yml
│       ├── postgres
│          └── postgres-init.sh
│       └── settings
│           ├── project_config.json
│           └── settings.py

The above is just an example and assumes you selected postgres as your data storage, dbt as your data modeling and airflow as your orchestration with postgres as the metastore.

5. Open your settings file - {project_name}/src/settings/settings.py

- You need to populate this file with your specific credentials
    - `data_storage` (PostgreSQL or MySQL)
    - `data_modeling` (dbt or SQL Mesh)
    - `data_orchestration` (Airflow or Dagster)
- If you do not do this, the project will be usable, but the project's containers will be built with default values and will NOT BE production ready nor secure.

You are responsible for ensuring your project is secure, is setup properly and is ready for deployment!

6. Generate your docker compose file:

  1. cd into your project directory:
cd {project_name}
  1. generate your compose file:
rae generate-compose-file
  1. generate docker-compose file without changing directories:
rae generate-compose-file --project-name {project_name}

7. Run your project's Docker containers:

Docker must be installed AND running on your host machine or this command will fail So make sure you have Docker Desktop installed and actively running on your machine!

cd {project_name}/src

Then simply start the containers:

docker-compose up -d

This will run the docker containers for each service and link them via a docker network. The process allows for each container to communicate with one another while still ensuring all tools operate in an isolated state.


Current State of Project

Future Implementations

1. Add secondary test coverage to the project:
  - src/cli.py
  - src/data_modeling/dbt_modeling
  - src/data_modeling/sql_mesh_modeling
  - src/data_orchestration/airflow_orchestration.py
  - src/data_orchestration/dagster_orchestration.py
  - src/data_storage/mysql_storage.py
  - src/data_storage/postgresql_storage.py
  - src/generators/docker_compose_generator.py
  
2. Continue iterating on test coverage
  - src/managers/data_modeling_manager.py
  - src/managers/data_orchestration_manager.py
  - src/managers/data_storage_manager.py
  - src/managers/settings_manager.py
  - src/utility/base_manager.py
  - src/utility/base_tool.py
  - src/utility/dockerfile_writer.py
  - src/utility/indented_dumper.py
  - src/utility/shell_script_writer.py
  - src/utility/supported_tools.py
  - src/main.py

3. Add support for additional data storage tools:
  - Snowflake
  - DuckDB?
  - SQL Server?
  - Databricks
    - AWS S3
    - Google Cloud Storage
    - Azure Blob Storage

4. Add support to allow users to scaffold single applications or custom combinations of tool stacks
  - Scenarios:
    - user only needs a data modeling tool
    - user only needs a data modeling tool and a data storage tool
    - user only needs an orchestration tool
    - etc
  - Intent:
    - To allow greater flexibility and provide a wider use-case for the CLI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rae_cli-0.2.9.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rae_cli-0.2.9-py3-none-any.whl (36.6 kB view details)

Uploaded Python 3

File details

Details for the file rae_cli-0.2.9.tar.gz.

File metadata

  • Download URL: rae_cli-0.2.9.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for rae_cli-0.2.9.tar.gz
Algorithm Hash digest
SHA256 9f612a529d4c23a03cb4826b97391107a0c0981c1d017040241dcaa360ce5eb8
MD5 f0fee0c6cc708bcf38406d57055a3a3c
BLAKE2b-256 4360e7ceb1a56532d5fa7b1867d0fcce3036f13c159c3c360a22033ce9053f5f

See more details on using hashes here.

File details

Details for the file rae_cli-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: rae_cli-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 36.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for rae_cli-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 da5a9aab95e6605fbe2d2887e012a06feb34aaae41aca9182c8486baa68f13fb
MD5 c3337fcb3041f4a1e985f56d9734f06c
BLAKE2b-256 b59c19f00e6b9824cdcd56323132e454e6815f97af5f7b2e969ee14d00f20e13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page