Synchronization tool for dbt models to Cube.js schemas and BI tools
Project description
dbt-cube-sync
A powerful synchronization tool that creates a seamless pipeline from dbt models to Cube.js schemas and BI tools (Superset, Tableau, PowerBI).
Features
- ๐ dbt โ Cube.js: Auto-generate Cube.js schemas from dbt models with metrics
- ๐๏ธ Flexible Data Type Source: Get column types from catalog OR directly from database via SQLAlchemy
- ๐ฏ Model Filtering: Process specific models instead of all models
- ๐ Cube.js โ BI Tools: Sync schemas to multiple BI platforms
- ๐๏ธ Extensible Architecture: Plugin-based connector system for easy BI tool integration
- ๐ณ Docker Support: Containerized execution with orchestration support
- ๐ฏ CLI Interface: Simple command-line tools for automation
Supported BI Tools
- โ Apache Superset - Full implementation
- ๐ง Tableau - Placeholder (coming soon)
- ๐ง PowerBI - Placeholder (coming soon)
Installation
Using Poetry (Development)
cd dbt-cube-sync
poetry install
poetry run dbt-cube-sync --help
Database Drivers (for SQLAlchemy URI feature)
If you want to use the --sqlalchemy-uri option to fetch column types directly from your database, you'll need to install the appropriate database driver:
# PostgreSQL
poetry add psycopg2-binary
# MySQL
poetry add pymysql
# Snowflake
poetry add snowflake-sqlalchemy
# BigQuery
poetry add sqlalchemy-bigquery
# Redshift
poetry add sqlalchemy-redshift
Using Docker
docker build -t dbt-cube-sync .
docker run --rm dbt-cube-sync --help
Quick Start
1. Generate Cube.js Schemas from dbt
Option A: Using catalog file (traditional method)
dbt-cube-sync dbt-to-cube \
--manifest ./target/manifest.json \
--catalog ./target/catalog.json \
--output ./cube_output
Option B: Using database connection (no catalog needed)
dbt-cube-sync dbt-to-cube \
--manifest ./target/manifest.json \
--sqlalchemy-uri postgresql://user:password@localhost:5432/mydb \
--output ./cube_output
Option C: Filter specific models
dbt-cube-sync dbt-to-cube \
--manifest ./target/manifest.json \
--sqlalchemy-uri postgresql://user:password@localhost:5432/mydb \
--models orders,customers,products \
--output ./cube_output
2. Sync to BI Tool (Optional)
# Sync to Superset
dbt-cube-sync cube-to-bi superset \
--cube-files ./cube_output \
--url http://localhost:8088 \
--username admin \
--password admin \
--cube-connection-name Cube
Configuration
Sample Configuration (sync-config.yaml)
connectors:
superset:
type: superset
url: http://localhost:8088
username: admin
password: admin
database_name: Cube
tableau:
type: tableau
url: https://your-tableau-server.com
username: your-username
password: your-password
powerbi:
type: powerbi
# PowerBI specific configuration
CLI Commands
Quick Reference
| Command | Description |
|---|---|
sync-all |
Ultimate command - Incremental sync: dbt โ Cube.js โ Superset โ RAG |
dbt-to-cube |
Generate Cube.js schemas from dbt models (with incremental support) |
cube-to-bi |
Sync Cube.js schemas to BI tools (Superset, Tableau, PowerBI) |
sync-all (Recommended)
Ultimate incremental sync command - handles the complete pipeline with state tracking.
# Basic incremental sync (Cube.js only)
dbt-cube-sync sync-all -m manifest.json -c catalog.json -o ./cube_output
# Full pipeline: dbt โ Cube.js โ Superset
dbt-cube-sync sync-all -m manifest.json -c catalog.json -o ./cube_output \
--superset-url http://localhost:8088 \
--superset-username admin \
--superset-password admin
# Full pipeline: dbt โ Cube.js โ Superset โ RAG embeddings
dbt-cube-sync sync-all -m manifest.json -c catalog.json -o ./cube_output \
--superset-url http://localhost:8088 \
--superset-username admin \
--superset-password admin \
--rag-api-url http://localhost:8000
# Force full rebuild (ignore state)
dbt-cube-sync sync-all -m manifest.json -c catalog.json -o ./cube_output --force-full-sync
Options:
| Option | Required | Description |
|---|---|---|
--manifest, -m |
Yes | Path to dbt manifest.json |
--catalog, -c |
No* | Path to dbt catalog.json |
--sqlalchemy-uri, -s |
No* | Database URI for column types |
--output, -o |
Yes | Output directory for Cube.js files |
--state-path |
No | State file path (default: .dbt-cube-sync-state.json) |
--force-full-sync |
No | Force full rebuild, ignore state |
--superset-url |
No | Superset URL |
--superset-username |
No | Superset username |
--superset-password |
No | Superset password |
--cube-connection-name |
No | Cube database name in Superset (default: Cube) |
--rag-api-url |
No | RAG API URL for embedding updates |
*Either --catalog or --sqlalchemy-uri is required.
How Incremental Sync Works:
- Reads state file (
.dbt-cube-sync-state.json) with model checksums - Compares against current manifest to detect changes
- Only processes added or modified models
- Deletes Cube.js files for removed models
- Updates state file with new checksums
dbt-to-cube
Generate Cube.js schema files from dbt models with incremental support.
Options:
--manifest/-m: Path to dbt manifest.json file (required)--catalog/-c: Path to dbt catalog.json file--sqlalchemy-uri/-s: SQLAlchemy database URI for fetching column types--models: Comma-separated list of model names to process--output/-o: Output directory for Cube.js files (required)--template-dir/-t: Directory containing Cube.js templates (default: ./cube/templates)--state-path: State file for incremental sync (default:.dbt-cube-sync-state.json)--force-full-sync: Force full regeneration, ignore cached state--no-state: Disable state tracking (legacy behavior)
Examples:
# Incremental sync (default)
dbt-cube-sync dbt-to-cube -m manifest.json -c catalog.json -o output/
# Force full rebuild
dbt-cube-sync dbt-to-cube -m manifest.json -c catalog.json -o output/ --force-full-sync
# Using database connection (no catalog needed)
dbt-cube-sync dbt-to-cube -m manifest.json -s postgresql://user:pass@localhost/db -o output/
# Filter specific models
dbt-cube-sync dbt-to-cube -m manifest.json -c catalog.json -o output/ --models users,orders
cube-to-bi
Sync Cube.js schemas to BI tool datasets.
Arguments:
bi_tool: BI tool type (superset,tableau,powerbi)
Options:
--cube-files/-c: Directory containing Cube.js files (required)--url/-u: BI tool URL (required)--username/-n: BI tool username (required)--password/-p: BI tool password (required)--cube-connection-name/-d: Name of Cube database connection in BI tool (default: Cube)
Example:
dbt-cube-sync cube-to-bi superset -c cube_output/ -u http://localhost:8088 -n admin -p admin -d Cube
State File
The state file (.dbt-cube-sync-state.json) tracks:
{
"version": "1.0",
"last_sync_timestamp": "2024-01-15T10:30:00Z",
"manifest_path": "/path/to/manifest.json",
"models": {
"model.project.users": {
"checksum": "abc123...",
"has_metrics": true,
"last_generated": "2024-01-15T10:30:00Z",
"output_file": "./cube_output/Users.js"
}
}
}
Delete this file to force a full rebuild, or use --force-full-sync.
Architecture
dbt models (with metrics)
โ
dbt-cube-sync generate-cubes
โ
Cube.js schemas
โ
dbt-cube-sync sync-bi [connector]
โ
BI Tool Datasets (Superset/Tableau/PowerBI)
Project Structure
dbt-cube-sync/
โโโ dbt_cube_sync/
โ โโโ cli.py # CLI interface
โ โโโ config.py # Configuration management
โ โโโ core/
โ โ โโโ dbt_parser.py # dbt manifest parser
โ โ โโโ db_inspector.py # Database column type inspector (SQLAlchemy)
โ โ โโโ cube_generator.py # Cube.js generator
โ โ โโโ models.py # Pydantic data models
โ โโโ connectors/
โ โโโ base.py # Abstract base connector
โ โโโ superset.py # Superset implementation
โ โโโ tableau.py # Tableau placeholder
โ โโโ powerbi.py # PowerBI placeholder
โโโ Dockerfile # Container definition
โโโ pyproject.toml # Poetry configuration
โโโ README.md
Adding New BI Connectors
- Create a new connector class inheriting from
BaseConnector - Implement the required abstract methods
- Register the connector using
ConnectorRegistry.register()
Example:
from .base import BaseConnector, ConnectorRegistry
class MyBIConnector(BaseConnector):
def _validate_config(self):
# Validation logic
pass
def connect(self):
# Connection logic
pass
def sync_cube_schemas(self, cube_dir):
# Sync implementation
pass
# Register the connector
ConnectorRegistry.register('mybi', MyBIConnector)
Docker Integration
The tool is designed to work in containerized environments with proper dependency orchestration:
- dbt docs: Runs
dbt buildthen serves documentation - dbt-cube-sync: Runs sync pipeline after dbt and Cube.js are ready
- BI Tools: Receive synced datasets after sync completes
Contributing
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests if applicable
- Submit a pull request
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_cube_sync-0.1.0a25.tar.gz.
File metadata
- Download URL: dbt_cube_sync-0.1.0a25.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9673cf203187643a902ac86a426311863a5f1aba474a4c26aaf448a28aa2c440
|
|
| MD5 |
7c456118295eab711df29efb9ea2ee20
|
|
| BLAKE2b-256 |
65ffd2991891b4ee2fd124e4fec7d250bc9ae249f28b94dd3925295deca6b6b8
|
File details
Details for the file dbt_cube_sync-0.1.0a25-py3-none-any.whl.
File metadata
- Download URL: dbt_cube_sync-0.1.0a25-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff77a1b3057b8e82d3fadb2d10924607ce3932af636e18e3ddfd64db6ba2736c
|
|
| MD5 |
320b0bf1508c5ac3574e89d2930be23f
|
|
| BLAKE2b-256 |
977a3cb8291f597ef447a99d4f7040be2b49af6d78f04297de20512cec887283
|