Skip to main content

ACC-02: Databricks pipeline project scaffolding and build CLI

Project description

ACC CLI โ€” Databricks Project Generator

PyPI version Python

A command-line tool to scaffold production-ready Databricks pipeline projects from templates. Built with Typer and Cookiecutter.


Table of Contents


Features

  • ๐Ÿš€ Scaffold a Databricks ETL pipeline project in seconds
  • ๐Ÿ“ Generates a fully structured Python package with src/ layout
  • โ˜๏ธ Supports multiple storage backends: UC, DBFS, S3, ADLS
  • ๐Ÿ”ง Includes Makefile, pyproject.toml, configs, Databricks job JSON, and tests out of the box
  • ๐Ÿ“ฆ Installable via pip โ€” works as a global CLI tool

Project Structure

python-deployment-package/
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ pyproject.toml           # Package metadata & entry point
โ”œโ”€โ”€ requirements.txt         # Runtime dependencies
โ”œโ”€โ”€ dist/                    # Built distributions (wheel + sdist)
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ cli-spec.md
โ”‚   โ”œโ”€โ”€ project-structure.md
โ”‚   โ””โ”€โ”€ tech-decisions.md
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ acc_cli/             # Main CLI package
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ cli.py           # Typer app & entry point
โ”‚       โ”œโ”€โ”€ config.py        # Templates & storage config
โ”‚       โ”œโ”€โ”€ commands/
โ”‚       โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚       โ”‚   โ””โ”€โ”€ init.py      # `acc init` command
โ”‚       โ””โ”€โ”€ templates/
โ”‚           โ””โ”€โ”€ etl/         # ETL pipeline cookiecutter template
โ”‚               โ”œโ”€โ”€ cookiecutter.json
โ”‚               โ””โ”€โ”€ {{cookiecutter.project_slug}}/
โ”‚                   โ”œโ”€โ”€ Makefile
โ”‚                   โ”œโ”€โ”€ pyproject.toml
โ”‚                   โ”œโ”€โ”€ README.md
โ”‚                   โ”œโ”€โ”€ requirements.txt
โ”‚                   โ”œโ”€โ”€ requirements-dev.txt
โ”‚                   โ”œโ”€โ”€ .gitignore
โ”‚                   โ”œโ”€โ”€ configs/
โ”‚                   โ”‚   โ”œโ”€โ”€ dev.yaml
โ”‚                   โ”‚   โ””โ”€โ”€ prod.yaml
โ”‚                   โ”œโ”€โ”€ jobs/
โ”‚                   โ”‚   โ””โ”€โ”€ databricks_job.json
โ”‚                   โ”œโ”€โ”€ src/{{cookiecutter.project_slug}}/
โ”‚                   โ”‚   โ”œโ”€โ”€ main.py
โ”‚                   โ”‚   โ”œโ”€โ”€ pipelines/
โ”‚                   โ”‚   โ”œโ”€โ”€ tasks/
โ”‚                   โ”‚   โ””โ”€โ”€ utils/
โ”‚                   โ””โ”€โ”€ tests/
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ __init__.py
    โ””โ”€โ”€ test_cli.py

Installation

pip install acc-cli

Note (macOS): If acc is not found after install, add Python's bin directory to your PATH:

echo 'export PATH="/Library/Frameworks/Python.framework/Versions/3.12/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

Usage

Start the scaffolder (default command)

acc

Explicit init subcommand

acc init

Help

acc --help
acc init --help

Interactive prompts

When you run acc init, you will be prompted for:

Prompt Description Default
Project name Name of your new project โ€”
Version Initial version 0.1.0
Author Your name โ€”
Template Project template to use etl
Storage Cloud storage backend uc
Databricks workspace URL Your workspace URL โ€”
Output directory Where to create the project current directory

Example session

ACC Project Scaffolder

Available templates:
  etl        โ€” Extract-Transform-Load pipeline
  ml         โ€” Machine learning pipeline
  utility    โ€” Utility / helper library

Project name: my-sales-pipeline
Version [0.1.0]:
Author: Jane Doe
Template [etl / ml / utility] [etl]:
Storage [uc / dbfs / s3 / adls] [uc]: s3
Databricks workspace URL: https://adb-xxxx.azuredatabricks.net
Output directory [...]: /Users/jane/projects

Project created successfully!

Location:   /Users/jane/projects/my_sales_pipeline
Storage:    s3://bucket/wheels
Wheel path: s3://bucket/wheels/my-sales-pipeline-0.1.0-py3-none-any.whl

Next steps:
  cd /Users/jane/projects/my_sales_pipeline
  make install && make test

Available Templates

Key Description Status
etl Extract-Transform-Load pipeline โœ… Available
ml Machine learning pipeline ๐Ÿšง Coming soon
utility Utility / helper library ๐Ÿšง Coming soon

Storage Options

Key Resolved path
uc dbfs:/Volumes/catalog/schema/volume
dbfs dbfs:/FileStore/wheels
s3 s3://bucket/wheels
adls abfss://container@storageaccount.dfs.core.windows.net/wheels

Development Setup

1. Clone the repository

git clone https://github.com/your-org/python-deployment-package.git
cd python-deployment-package

2. Create a virtual environment

python3 -m venv .venv
source .venv/bin/activate

3. Install in editable mode

pip install -e .

4. Verify the CLI works

acc --help

5. Run tests

pytest tests/ -v

Publishing a New Version

โš ๏ธ PyPI does not allow re-uploading the same version. Always bump the version before building.

Step 1 โ€” Bump the version in pyproject.toml

[project]
version = "0.1.2"   # โ† change this

Step 2 โ€” Clean previous builds

rm -rf dist/ build/ src/acc_cli.egg-info

Step 3 โ€” Build the distribution

python3 -m build

This creates two files inside dist/:

  • acc_cli-<version>-py3-none-any.whl โ€” binary wheel (fast install)
  • acc_cli-<version>.tar.gz โ€” source distribution

Step 4 โ€” Upload to PyPI

python3 -m twine upload dist/*

Twine will use credentials from ~/.pypirc. If that file doesn't exist, it will prompt for your API token.

Setting up ~/.pypirc (one-time)

[distutils]
index-servers = pypi

[pypi]
username = __token__
password = pypi-YOUR_API_TOKEN_HERE

Get your API token at https://pypi.org/manage/account/token/ The username must always be __token__ (literally).

Step 5 โ€” Verify on PyPI

https://pypi.org/project/acc-cli/

Versioning Convention

This project follows Semantic Versioning (MAJOR.MINOR.PATCH):

Change type Example
Bug fix / small restructure 0.1.1 โ†’ 0.1.2
New feature / new template 0.1.2 โ†’ 0.2.0
Breaking change 0.2.0 โ†’ 1.0.0

Dependencies

Package Purpose
typer>=0.12 CLI framework
rich>=13 Terminal formatting
cookiecutter>=2.6 Project scaffolding from templates

License

Proprietary โ€” All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acc_cli-0.1.4.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acc_cli-0.1.4-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file acc_cli-0.1.4.tar.gz.

File metadata

  • Download URL: acc_cli-0.1.4.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for acc_cli-0.1.4.tar.gz
Algorithm Hash digest
SHA256 444eae8ad5592f1a1969f5e230f7cdec000d8da443fded309e6cea3b5e55f9fc
MD5 55b038aacdf10e82d97724d4e75fc4cd
BLAKE2b-256 dad9dac96da4c0aec833a10b99ef7d1236c1671dc25ad27b6cd05cbcfb164855

See more details on using hashes here.

File details

Details for the file acc_cli-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: acc_cli-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for acc_cli-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5a005ba15e7399f3685e144cbf8fe539f9f8858e8ea12d4befebfd2c7657e048
MD5 a91bd74348d0f82c586cabaaf8e47d96
BLAKE2b-256 38ecedd0cc83e30f600a61f51e45a0a7dce45fb6543c655de53d9754cad198a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page