The Command Line Interface for the msh atomic data platform.
Project description
msh: The Atomic Data Engine
Stop gluing Python scripts to SQL files. Define Ingestion, Transformation, and Lineage in a single, version-controlled asset.
The Problem: Fragmented Data Stacks
In the modern data stack, your pipeline is fragmented:
- Ingestion happens in one tool (Airbyte, Fivetran), often defined in UI or JSON.
- Transformation happens in another (dbt, SQL), defined in
.sqlfiles. - Orchestration happens in a third (Airflow, Dagster), defined in Python.
This separation creates friction. Adding a new column requires touching three different systems. Debugging a failure requires tracing lineage across boundaries.
The Solution: The Atomic Asset
msh unifies these steps into a single Atomic Asset. An .msh file defines everything about a data product: where it comes from, how it changes, and where it goes.
Example: models/orders.msh
Option 1: Direct Credentials (Fully Atomic)
name: orders
ingest:
type: sql_database
credentials: "postgresql://user:pass@prod-db.com/sales"
table: "public.orders"
transform: |
SELECT
id as order_id,
customer_id,
total_amount,
created_at
FROM {{ source }}
WHERE status = 'completed'
Option 2: Source References (DRY for Large Projects)
Define sources once in msh.yaml:
# msh.yaml
sources:
- name: prod_db
type: sql_database
credentials: "${DB_PROD_CREDENTIALS}" # Environment variable
schema: public
tables:
- name: orders
description: Customer orders table
- name: customers
description: Customer master data
- name: jsonplaceholder
type: rest_api
endpoint: "https://jsonplaceholder.typicode.com"
resources:
- name: users
- name: posts
Then reference in .msh files:
# models/staging/stg_orders.msh
name: stg_orders
ingest:
source: prod_db
table: orders
transform: |
SELECT * FROM {{ source }}
# models/staging/stg_users.msh
name: stg_users
ingest:
source: jsonplaceholder
resource: users
transform: |
SELECT id, name, email FROM {{ source }}
Benefits:
- ✅ DRY: Define credentials once, reference everywhere
- ✅ Environment Variables: Use
${VAR_NAME}for sensitive credentials - ✅ Backward Compatible: Direct credentials still work
- ✅ dbt-style: Familiar pattern for dbt users
Key Capabilities
⚡ Smart Ingest
Save API costs and storage. msh parses your SQL transformation before running ingestion. It detects exactly which columns you are selecting (id, userId, title) and instructs the ingestion engine to only fetch those fields from the API or Database.
🔵/🟢 Blue/Green Deployment
Zero downtime swaps. Every run creates a new version of your tables (e.g., raw_orders_a1b2, model_orders_a1b2). The live view is only swapped (CREATE OR REPLACE VIEW) once the new version is fully built and tested. Your dashboards never break during a run.
↩️ Atomic Rollbacks
Instant recovery. Deployed a bug? No problem.
msh rollback orders
msh instantly swaps the view back to the previous successful version. No data needs to be re-processed.
🔌 Universal Connectivity
The Full Data Lifecycle. msh supports every flow your data needs:
- API to DB: Ingest from REST/GraphQL APIs directly into your warehouse.
- DB to DB: Replicate and transform data between databases (e.g., Postgres -> Snowflake).
- Reverse ETL: Push transformed models back to operational systems (e.g., Snowflake -> Salesforce).
🚀 Publish Command
Activate your data. Push your transformed models to external systems with a single command.
msh publish orders --to salesforce
🔀 Git-Aware Development
Isolated workspaces. When working on different git branches, msh automatically creates isolated schemas. Developers can work simultaneously without conflicts. Production deployments always use standard schemas.
⚡ Bulk Operations
Process multiple assets at once. Run, rollback, and query multiple assets with a single command. Perfect for automation and CI/CD pipelines.
🔍 Dependency Selection
Run upstream or downstream dependencies. Use +asset_name to run all upstream dependencies, or asset_name+ to run the asset and all downstream dependents.
🤖 AI-Powered Features
Get AI assistance for your data assets. Use AI to explain, review, generate, and fix your .msh files. Includes glossary management and context-aware suggestions.
🔎 Auto-Discovery
Generate .msh files automatically. Probe REST APIs or SQL databases and automatically generate configuration files with inferred schemas and types.
📊 Data Sampling & Preview
Preview your data before running. Sample data from assets to verify transformations and test queries without running the full pipeline.
📚 Glossary Management
Define and link business terms. Create a shared glossary of business terms, metrics, and dimensions. Link them to assets and columns for better documentation and AI context.
🔒 Schema Contracts
Control schema evolution. Define how schemas should evolve (freeze or evolve) to prevent unexpected changes or allow controlled growth.
Usage Examples
Git-Aware Development
# Each developer gets isolated schemas automatically
git checkout feature/new-api
msh run # Uses: main_feature_new_api
git checkout bugfix/issue-123
msh run # Uses: main_bugfix_issue_123
# Production always uses standard schemas
msh run --env prod # Uses: main
Bulk Operations
# Run all assets
msh run --all
# Rollback multiple assets
msh rollback orders,revenue,users
# Rollback all assets
msh rollback --all
# Get JSON output for automation
msh status --format json
Dependency Selection
# Run asset and all upstream dependencies
msh run +fct_orders
# Run asset and all downstream dependents
msh run fct_orders+
# Run specific asset only
msh run fct_orders
Command Aliases
# All asset commands can be accessed via 'msh asset'
msh asset run orders
msh asset rollback orders
msh asset status
msh asset sample orders --size 10
msh asset versions orders
AI Commands
# Explain what an asset does
msh ai explain models/orders.msh
# Review asset for risks and issues
msh ai review models/orders.msh
# Generate a new asset from description
msh ai new --name customer_metrics
# Fix a broken asset
msh ai fix models/orders.msh
# Generate tests for an asset
msh ai tests models/orders.msh
# Generate context pack for AI
msh ai context --asset orders --json
Auto-Discovery
# Discover REST API and generate .msh file
msh discover https://api.github.com/repos/dlt-hub/dlt/issues --name github_issues
# Discover SQL database and generate .msh file
msh discover postgresql://user:pass@host:5432/db --name customers --table public.users
# Write to file automatically
msh discover https://api.example.com/data --name api_data --write
Data Sampling
# Sample 10 rows from an asset
msh sample orders --size 10
# Sample with specific environment
msh sample orders --env prod --size 100
Glossary Management
# Add a glossary term
msh glossary add-term "Customer Lifetime Value" --description "Total revenue from a customer"
# Link term to asset and column
msh glossary link-term "Customer Lifetime Value" --asset customers --column customer_id
# List all glossary terms
msh glossary list --json
# Export glossary for AI context
msh glossary export --json
Schema Contracts
# models/orders.msh
name: orders
ingest:
type: sql_database
source: prod_db
table: orders
contract:
evolution: freeze # Prevent new columns from being added
transform: |
SELECT * FROM {{ source }}
Layered Projects (dbt-style)
Build complex DAGs with staging → intermediate → marts layers:
# msh.yaml
sources:
- name: prod_db
type: sql_database
credentials: "${DB_PROD_CREDENTIALS}"
schema: public
tables:
- name: orders
- name: customers
# models/staging/stg_orders.msh
name: stg_orders
ingest:
source: prod_db
table: orders
transform: |
SELECT
id as order_id,
customer_id,
amount,
created_at
FROM {{ source }}
# models/intermediate/int_order_customer.msh
name: int_order_customer
transform: |
SELECT
o.order_id,
o.amount,
c.name as customer_name
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.customer_id
# models/marts/fct_orders.msh
name: fct_orders
transform: |
SELECT
order_id,
customer_name,
amount,
created_at
FROM {{ ref('int_order_customer') }}
Dependency Resolution:
- Use
{{ ref('model_name') }}to reference other.mshfiles - msh automatically builds the DAG and runs models in correct order
- Run upstream dependencies:
msh run +fct_orders(runs all dependencies) - Run downstream:
msh run fct_orders+(runs fct_orders and dependents)
Architecture
msh acts as the Control Plane for best-in-class open source tools:
- Extract/Load: Powered by dlt (Data Load Tool).
- Transform: Powered by dbt (Data Build Tool).
- Orchestrate: Powered by msh-engine.
You get the power of the ecosystem without the boilerplate.
Installation & Quickstart
1. Install
pip install msh-cli
2. Initialize a Project
msh init
cd my_msh_project
3. Run the Pipeline
msh run
4. View the Dashboard
msh ui
Command Reference
Core Commands
msh init- Initialize a new msh projectmsh run [asset]- Run assets (use--allfor all assets)msh rollback [asset]- Rollback to previous versionmsh status- Show deployment statusmsh plan- Show execution plan without runningmsh doctor- Diagnose project configuration issues
Asset Commands (Aliases)
msh asset run [asset]- Run assetsmsh asset rollback [asset]- Rollback assetsmsh asset status- Show statusmsh asset sample [asset]- Sample data from assetmsh asset versions [asset]- Show version historymsh asset preview [asset]- Preview transformation SQL
AI Commands
msh ai explain <asset>- Explain what an asset doesmsh ai review <asset>- Review asset for risksmsh ai new- Generate new asset from descriptionmsh ai fix <asset>- Fix broken assetmsh ai tests <asset>- Generate tests for assetmsh ai context- Generate AI context pack
Discovery & Development
msh discover <source>- Auto-discover and generate .msh filemsh sample <asset>- Sample data from assetmsh validate- Validate .msh file syntaxmsh fmt- Format .msh files
Glossary Commands
msh glossary add-term <name>- Add glossary termmsh glossary link-term <term> --asset <asset>- Link term to assetmsh glossary list- List all termsmsh glossary export- Export glossary as JSON
Utility Commands
msh ui- Launch web dashboardmsh lineage- Show asset lineage graphmsh manifest- Generate project manifestmsh config- Configure msh settings
Supported Destinations
msh supports all major data warehouses and databases:
- Snowflake - Full support with optimized connection handling, schema sanitization, and error handling
- PostgreSQL - Native support with connection pooling
- DuckDB - Default local development database
- BigQuery - Google Cloud BigQuery support
- Redshift - Amazon Redshift support
- MySQL - MySQL database support
- SQLite - SQLite support for testing
Configuration
Environment Variables
For Snowflake:
# dlt / msh (Ingestion & Orchestration)
export DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE="ANALYTICS"
export DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD="secure_password"
export DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME="MSH_USER"
export DESTINATION__SNOWFLAKE__CREDENTIALS__HOST="xyz123.snowflakecomputing.com"
export DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE="COMPUTE_WH"
export DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE="TRANSFORMER"
# dbt (Transformation)
export SNOWFLAKE_ACCOUNT="xyz123"
export SNOWFLAKE_USER="MSH_USER"
export SNOWFLAKE_PASSWORD="secure_password"
export SNOWFLAKE_ROLE="TRANSFORMER"
export SNOWFLAKE_DATABASE="ANALYTICS"
export SNOWFLAKE_WAREHOUSE="COMPUTE_WH"
For PostgreSQL:
export DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:pass@host:5432/db"
export POSTGRES_HOST="localhost"
export POSTGRES_USER="postgres"
export POSTGRES_PASSWORD="password"
export POSTGRES_DB="analytics"
License
msh is licensed under the Business Source License (BSL 1.1). You may use this software for non-production or development purposes. Production use requires a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file msh_cli-0.1.1.tar.gz.
File metadata
- Download URL: msh_cli-0.1.1.tar.gz
- Upload date:
- Size: 98.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5e24d9056a5f14d43e56fd3ea5f37a54ccbd79d6900699d6acde14c39c436c0
|
|
| MD5 |
35453812fcb452b0a939ebb5a4926831
|
|
| BLAKE2b-256 |
eca911160f3cd5c3db69c266b19325bd9d6cca524c8ecd7a76c9889507df0ecb
|
File details
Details for the file msh_cli-0.1.1-py3-none-any.whl.
File metadata
- Download URL: msh_cli-0.1.1-py3-none-any.whl
- Upload date:
- Size: 120.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74986e410f2193de4380042e887d2d253062d570c2b4551504b3c770a24e0b76
|
|
| MD5 |
27c1402e41c987738cc85793e346fff8
|
|
| BLAKE2b-256 |
9a1a4d5dc019236504550cc46ca8aee99cf23b8175b7b0f68d9f27391fb6b88f
|