Skip to main content

DWE CLI - Data Warehouse Ecosystem Orchestrator

Project description

dwe-core

The DWE CLI (dwe) is the orchestration brain of the Data Warehouse Ecosystem. It takes a blank or existing client Git repository and injects a fully working Adapter — infrastructure, application config, CI/CD pipelines, and local dev commands — in a single command.

How it works

dwe create-service test_adapter --git-repo https://github.com/client/repo --envs dev --envs prod

Internally this does:

1. Clone        GitPython clones the client repo to a temp directory
2. Hydrate      Copier renders the adapter template into the clone
3. State        CLI writes dwe-state.json
4. CI/CD        CLI renders per-environment GitHub Actions / GitLab CI files
5. Branch       initial-commit branch is created and committed
6. Env branches dev, prod branches are created from initial-commit
7. Push         All branches are pushed to the remote
8. Secrets      GitHub/GitLab API uploads secrets to the repository settings

The result is a client repo that already has working infrastructure code, a justfile with just up / just deploy-prod, and CI/CD that deploys to the right environment when you push to its branch.


Installation

pip install poetry        # if not already installed
poetry install            # from dwe-core source (creates venv, installs deps)
# or once published:
pip install dwe-core

Verify:

dwe --help
dwe list-adapters

Commands

dwe create-service

dwe create-service <adapter_name> \
  --git-repo <url> \
  [--envs <name>]...       \   # default: development, main
  [--secrets <json>]       \   # e.g. '{"AWS_KEY":"abc"}'
  [--tag <version>]        \   # adapter git tag, e.g. v1.2.0
  [--token <api-token>]    \   # or set GITHUB_TOKEN / GITLAB_TOKEN
  [--aws-region <region>]  \
  [--instance-type <type>] \
  [--clone-dir <path>]         # default: temp dir

Example — full run:

export GITHUB_TOKEN=ghp_xxxx

dwe create-service test_adapter \
  --git-repo https://github.com/acme/data-platform \
  --envs development \
  --envs staging \
  --envs main \
  --secrets '{"PULUMI_ACCESS_TOKEN":"pul-xxx","AWS_ACCESS_KEY_ID":"AKI...","AWS_SECRET_ACCESS_KEY":"..."}' \
  --tag v1.0.0 \
  --aws-region eu-west-1 \
  --instance-type t3.small

After this runs, the data-platform repo has:

.github/workflows/
  deploy-development.yaml
  deploy-staging.yaml
  deploy-main.yaml
blueprint/
  html/index.html
  instance-setup.sh
docker-compose.yml
docker-compose.prod.yml
.env.example
justfile
infrastructure/
  __main__.py          <- project_name, instance_type already substituted
  Pulumi.yaml
  requirements.txt
dwe-state.json
.copier-answers.yml    <- Copier's internal state (enables future updates)

dwe update-service

dwe update-service <adapter_name> <local_path> [--tag <version>]

Example:

dwe update-service test_adapter ./data-platform --tag v1.2.0

Internally:

  1. Reads dwe-state.json and validates the adapter name matches
  2. Creates a branch dwe-update-20260322-1.2.0
  3. Runs copier.run_update()smart merge that preserves your customisations
  4. Updates dwe-state.json with the new version

Review the diff on the branch, then merge into your environment branches to trigger deployments.

dwe list-adapters

dwe list-adapters

Shows all adapters registered in adapters.json.


Adapter Registry (adapters.json)

{
  "test_adapter": {
    "path": "/absolute/path/to/dwe_test_adapter",
    "type": "local",
    "description": "Test adapter: AWS EC2 instance via Pulumi"
  },
  "superset_adapter": {
    "url": "https://github.com/hipposys/dwe-superset-adapter",
    "type": "git",
    "description": "Apache Superset on ECS"
  }
}

How to Define a New Adapter

An adapter is a real, runnable project that also serves as a Copier template. The guiding principle:

The adapter must work locally as-is. A developer should be able to git clone the adapter, run just up, and have a working service — without running the DWE CLI at all.

Step 1: Create the adapter repository

mkdir my_adapter && cd my_adapter
git init

Step 2: Build a working application first

Build your service as a real project before adding any template variables. For example, if you're building a Superset adapter:

# Make it work locally first
docker compose up    # verify it runs

Only once everything works locally do you introduce {{ variables }}.

Step 3: Directory structure

my_adapter/
├── copier.yml                  # Copier config + question definitions
│
├── docker-compose.yml          # Real, runnable. Uses ${ENV_VAR:-default} for runtime values.
├── docker-compose.prod.yml     # Production overrides (restart policy, logging)
├── .env.example                # Template for secrets — committed; .env is git-ignored
├── .gitignore
│
├── justfile                    # Dev commands (just up, just deploy-prod, just infra-up)
│
├── blueprint/                  # Application-level config files
│   ├── html/                   # or nginx.conf, superset_config.py, etc.
│   └── instance-setup.sh       # EC2 user-data bootstrap script
│
├── infrastructure/             # Pulumi IaC — only files here use .jinja
│   ├── __main__.py.jinja       # <- .jinja because it embeds {{ project_name }}
│   ├── Pulumi.yaml.jinja       # <- .jinja because it embeds {{ project_name }}
│   └── requirements.txt
│
└── ci-templates/               # Jinja2 templates rendered by the CLI (not Copier)
    └── deploy.yaml             # Uses {{ ENV_NAME }}, {{ AWS_REGION }}

Step 4: Write copier.yml

copier.yml controls how Copier processes the adapter. Key settings:

_templates_suffix: .jinja    # ONLY files ending in .jinja are treated as templates
                              # Everything else is copied verbatim

_exclude:
  - copier.yml               # Don't copy Copier's own config
  - ci-templates             # CLI handles this separately
  - README.md                # Adapter's README is not for client repos
  - .git
  - .env                     # Never copy actual secrets
  - __pycache__
  - "*.pyc"

_skip_if_exists:
  - .env.example             # Preserve user customisations on updates

# Questions (answered non-interactively by the dwe CLI):
project_name:
  type: str
  help: "Client project name (used for cloud resource naming)"

adapter_name:
  type: str
  default: "my_adapter"
  when: false    # always set programmatically

adapter_version:
  type: str
  default: "v1.0.0"
  when: false    # always set programmatically

environments:
  type: yaml
  default: "[development, main]"

aws_region:
  type: str
  default: "us-east-1"

Step 5: Decide what needs Jinja2

Apply this rule: if the value changes per client, use {{ variable }}. If it changes per deployment environment, use a .env variable.

File Approach Reason
docker-compose.yml .env interpolation (${VAR:-default}) Works locally without any substitution; runtime config
infrastructure/__main__.py Jinja2 (.jinja extension) Cloud resource names must be unique per client at provision time
infrastructure/Pulumi.yaml Jinja2 (.jinja extension) Stack name must be unique per client
justfile Verbatim copy (no .jinja) Commands are identical across clients
blueprint/instance-setup.sh Verbatim copy Generic bootstrap, no client-specific values
.env.example Verbatim copy Users fill in real values after cloning

Jinja2 syntax in .jinja files:

# infrastructure/__main__.py.jinja
instance = aws.ec2.Instance(
    "{{ project_name }}-instance",          # <- substituted by Copier
    instance_type="{{ instance_type }}",
    ...
)

After dwe create-service this becomes:

instance = aws.ec2.Instance(
    "acme-data-platform-instance",
    instance_type="t3.small",
    ...
)

Step 6: Write ci-templates/deploy.yaml

This is a Jinja2 file rendered by the dwe CLI (not by Copier) to generate one workflow file per environment. The CLI uses {@ @} as variable delimiters (not {{ }}), so GitHub Actions ${{ secrets.X }} syntax passes through untouched — no escaping needed.

name: Deploy to {@ ENV_NAME @}

on:
  push:
    branches:
      - {@ ENV_NAME @}
  pull_request:
    branches:
      - {@ ENV_NAME @}

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: {@ ENV_NAME @}
    steps:
      - uses: actions/checkout@v4
      - name: Deploy
        run: just deploy-prod
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}    # passes through unchanged
          AWS_REGION: {@ AWS_REGION @}                           # substituted by dwe CLI

Available variables: {@ ENV_NAME @}, {@ AWS_REGION @}.

Step 7: Register the adapter

Add an entry to dwe-core/adapters.json:

Local (development):

{
  "my_adapter": {
    "path": "/absolute/path/to/my_adapter",
    "type": "local",
    "description": "My adapter description"
  }
}

Remote Git (production):

{
  "my_adapter": {
    "url": "https://github.com/your-org/my-adapter",
    "type": "git",
    "description": "My adapter description"
  }
}

Step 8: Test the adapter

Test locally first (without DWE CLI):

cd my_adapter
cp .env.example .env
just up                    # docker compose up — must work here

Test Copier rendering in isolation:

pip install copier
copier copy /path/to/my_adapter /tmp/test-output \
  --data project_name=testproject \
  --data aws_region=us-east-1 \
  --defaults --overwrite --trust

# Inspect the output
ls /tmp/test-output
cat /tmp/test-output/infrastructure/Pulumi.yaml    # should have project_name substituted
cat /tmp/test-output/docker-compose.yml            # should be identical to source
cd /tmp/test-output && docker compose up           # should still work

Test via dwe CLI:

dwe create-service my_adapter \
  --git-repo https://github.com/test-org/empty-repo \
  --envs development \
  --envs main

Adapter Versioning and Updates

Tag your adapter repository with semantic version tags. The DWE CLI and Copier use these tags for update-service:

cd my_adapter
git add -A && git commit -m "feat: add postgres service"
git tag v1.1.0
git push origin v1.1.0

When a client wants to update:

dwe update-service my_adapter ./client-repo --tag v1.1.0

Copier reads the source URL from .copier-answers.yml in the client repo, checks out v1.1.0, and runs a 3-way merge. Files the user has customised are preserved where possible; conflicts surface as standard git merge conflicts.

What gets updated:

  • infrastructure/ — Pulumi code (Jinja2 re-rendered with new template)
  • blueprint/ — Application config files
  • justfile — Dev commands

What is NOT updated (protected):

  • .env.example — skipped if it already exists (_skip_if_exists in copier.yml)
  • .copier-answers.yml — managed by Copier internally

State Files

dwe-state.json (DWE-managed)

Written by the dwe CLI after copier.run_copy(). Tracks DWE-specific metadata:

{
  "dwe_version": "1.0.0",
  "adapter": {
    "name": "test_adapter",
    "version": "v1.0.0",
    "last_update": "2026-03-22"
  },
  "environments": ["development", "main"],
  "infrastructure": "pulumi"
}

.copier-answers.yml (Copier-managed)

Written by Copier. Tracks the template source, version, and question answers. Do not edit manually. This is what enables copier.run_update() to know where the template came from.

# Changes here will be overwritten by copier
_commit: v1.0.0
_src_path: /path/to/my_adapter
project_name: acme-data-platform
aws_region: eu-west-1
instance_type: t3.small

Both files coexist. dwe-state.json is for DWE tooling; .copier-answers.yml is for Copier's update machinery.


Developer Workflow After create-service

Once the client repo is hydrated, the full developer loop is:

1. Local development (laptop):

git clone https://github.com/client/data-platform
cd data-platform
cp .env.example .env      # fill in local values (no real AWS keys needed)
just up                   # docker compose up — app is running at localhost:8080

2. Provision cloud infrastructure (once):

# Fill in real AWS keys in .env
just install-infra         # pip install pulumi pulumi-aws
just infra-preview         # see what Pulumi will create
just infra-up              # provision the EC2 instance

3. Deploy to EC2 (SSH into the instance, then):

git clone https://github.com/client/data-platform /srv/app
cd /srv/app
cp .env.example .env       # fill in production values
just deploy-prod           # docker compose -f ... up -d

4. CI/CD (automatic after push):

Pushing to development or main triggers the corresponding GitHub Actions workflow. See the CI/CD Workflow Design section below for the full two-path logic.


CI/CD Workflow Design

The generated CI/CD workflow (.github/workflows/deploy-{env}.yaml) implements a two-path logic inspired by the Superset production setup. The key insight: infrastructure changes and application changes require completely different responses.

The Two Paths

Push to branch
       │
       ▼
  Detect changes
  (dorny/paths-filter)
       │
       ├─── infrastructure/** changed?
       │         │
       │         ├─ Pull Request → pulumi preview  (validate, no apply)
       │         └─ Push        → pulumi up --yes  (apply infra changes)
       │
       └─── docker-compose / blueprint changed?
                 AND infrastructure NOT changed?
                         │
                         └─ Push → SSM: git pull + just deploy-prod
                                   (redeploy app on the live EC2 instance)

Why skip deploy when infra also changed? The pulumi up step re-provisions the EC2 instance itself, which already pulls the latest code via its user-data script. Running the app deploy on top of that would be redundant and potentially racy.

Job Summary

Job Trigger What it does
pulumi-preview PR, infrastructure/** changed Runs pulumi preview — shows what would change, no side effects
pulumi-apply Push, infrastructure/** changed Runs pulumi up --yes — applies infra changes
deploy-app Push, app files changed, infra NOT changed AWS SSM command: git pull && just deploy-prod on live EC2

Required Secrets

Set these via dwe create-service --secrets '{...}' or manually in GitHub repository settings:

Secret Description
AWS_ACCESS_KEY_ID AWS credentials for Pulumi and SSM
AWS_SECRET_ACCESS_KEY AWS credentials
PULUMI_ACCESS_TOKEN Pulumi Cloud token
PULUMI_CONFIG_PASSPHRASE Pulumi stack encryption passphrase
PULUMI_STACK Pulumi stack reference, e.g. myorg/myproject/development
EC2_INSTANCE_ID Instance ID from pulumi stack output instance_id, e.g. i-0abc1234

SSM Prerequisites

The deploy-app job uses AWS Systems Manager (SSM) instead of SSH — no port 22, no SSH key stored as a secret.

To enable SSM on the EC2 instance:

1. IAM instance profile — attach a role with these policies to the EC2:

{
  "Effect": "Allow",
  "Action": [
    "ssm:UpdateInstanceInformation",
    "ssmmessages:CreateControlChannel",
    "ssmmessages:OpenControlChannel",
    "ec2messages:GetMessages",
    "ec2messages:SendReply"
  ],
  "Resource": "*"
}

Or simply attach the AWS managed policy AmazonSSMManagedInstanceCore.

2. SSM agent — Amazon Linux 2023 ships with it pre-installed. The blueprint/instance-setup.sh bootstrap script ensures it's running:

systemctl enable amazon-ssm-agent
systemctl start amazon-ssm-agent

3. Store the instance ID — after running just infra-up, get the instance ID and store it as a secret:

cd infrastructure && pulumi stack output instance_id
# → i-0abc1234567890def
# Add this to GitHub repository secrets as EC2_INSTANCE_ID

Example: What Happens on a Typical Push

Scenario 1 — you edited blueprint/html/index.html:

Push to development branch
  ↓
detect-changes: infrastructure=false, app=true
  ↓
deploy-app runs:
  aws ssm send-command "git pull && just deploy-prod"
  polls every 10s until success
  prints stdout from EC2 instance
  ↓
New HTML is live ~30 seconds after push

Scenario 2 — you changed infrastructure/__main__.py.jinja (e.g. bigger instance type):

Push to development branch
  ↓
detect-changes: infrastructure=true, app=false
  ↓
pulumi-apply runs:
  pulumi up --yes
  Pulumi modifies the EC2 instance type in-place (or replaces it)
  ↓
Infrastructure updated. New instance pulls latest code via user-data.

Scenario 3 — you opened a PR with Pulumi changes:

Pull Request to development
  ↓
detect-changes: infrastructure=true
  ↓
pulumi-preview runs:
  pulumi preview
  Output shown in CI logs — no changes applied
  ↓
Reviewer can see exactly what Pulumi will do before merging.

Adapting for Other Platforms

The same two-path logic works for GitLab CI. The superset's .gitlab-ci.yml uses:

# Skip deploy if terraform changed
- if: $CI_COMMIT_BRANCH == "main"
  changes:
    - terraform_scalling/**/*
  when: never
# Only deploy if docker/compose changed
- if: $CI_COMMIT_BRANCH == "main"
  changes:
    - docker/**/*
    - docker-compose.yml

For your adapter's GitLab template, mirror this pattern with pulumi instead of terraform and infrastructure/** instead of terraform_scalling/**.


Adding a New Environment Later

Environments are set up at create-service time. To add one later:

# Create the branch
git checkout initial-commit
git checkout -b staging
git push origin staging

# Generate the workflow file
cp .github/workflows/deploy-development.yaml .github/workflows/deploy-staging.yaml
# Edit deploy-staging.yaml: change all occurrences of "development" to "staging"
git add .github/workflows/deploy-staging.yaml
git commit -m "chore: add staging environment"
git push

Releasing to PyPI

Two workflows handle the full release lifecycle:

bump version in pyproject.toml → merge to main
         │
         ▼
  tag-version.yml          triggers on: push to main, pyproject.toml changed
  reads Poetry version      creates git tag vX.Y.Z automatically
         │
         ▼
  (go to GitHub → Releases → Draft a new release → publish it)
         │
         ▼
  pypi-publish.yml          triggers on: release published
  poetry build + publish    pushes to PyPI via PYPI_TOKEN

One-time setup

Add PYPI_TOKEN to the repository secrets (Settings → Secrets → Actions):

  1. Go to https://pypi.org/manage/account/token/ and create an API token scoped to dwe-core
  2. In GitHub: Settings → Secrets and variables → Actions → New repository secret
    • Name: PYPI_TOKEN
    • Value: the token from PyPI (starts with pypi-)

Release flow

Step 1 — bump the version and merge to main:

poetry version patch        # 1.0.0 → 1.0.1
poetry version minor        # 1.0.0 → 1.1.0
poetry version major        # 1.0.0 → 2.0.0
poetry version prerelease   # 1.0.0 → 1.0.1a1
poetry version 1.2.0        # set explicit version

git add pyproject.toml
git commit -m "chore: bump version to $(poetry version -s)"
git push origin main

tag-version.yml fires on the push, reads the version from pyproject.toml, and pushes tag vX.Y.Z. No manual tagging needed, and it only runs on main.

Step 2 — publish the GitHub Release:

Go to github.com/<org>/dwe-core/releases, click Draft a new release, select the tag just created, and click Publish release.

pypi-publish.yml fires on the publish event: runs poetry install, poetry build, then poetry publish -u __token__ -p $PYPI_TOKEN.


Technical Stack

Concern Library
CLI framework Typer
Template engine Copier
Git operations GitPython
GitHub secrets PyGithub
GitLab variables python-gitlab
Runtime templating Jinja2 (for CI templates)
Infrastructure Pulumi
Task runner Just

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dwe_core-0.1.0a1.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dwe_core-0.1.0a1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file dwe_core-0.1.0a1.tar.gz.

File metadata

  • Download URL: dwe_core-0.1.0a1.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.11.15 Linux/6.17.0-1010-azure

File hashes

Hashes for dwe_core-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 59e8aed52c926ab69bb3ea0830325e4f1069d43055ff53c67ba0b5e04f31cbcc
MD5 262aa76a4194462fa6660723eee31cd4
BLAKE2b-256 a7404e655e70484195ee7d46f01e96ed049d780dd8b90782aca7ba872565c3a1

See more details on using hashes here.

File details

Details for the file dwe_core-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: dwe_core-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.11.15 Linux/6.17.0-1010-azure

File hashes

Hashes for dwe_core-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 88f6eb01704393f29a5e4d3ecaf718b74c5a536e8957c204461240d0fa11f5c6
MD5 3592bc3c80d5f6e33c48867676e3beca
BLAKE2b-256 1374d117280e63cd8274724a43be4a29a44d415080705b7a8c357733da80a97a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page