Version Control for Database Schemas
Project description
Datatrack - Version Control for Databases
Datatrack is a lightweight and open-source CLI tool that brings Git-like version control to your database schemas. Built for Data Engineers, Analytics Engineers, and Platform Teams, it automates: • Schema snapshots • Diffs across versions • Linting for naming and structure • Verification against custom rules • Exporting to JSON/YAML
Because in modern data systems, your schema is your contract—and when it breaks silently, everything else crumbles.
Features
- Snapshot schemas from any SQL-compatible DB
- Lint schema naming issues
- Enforce verification rules
- Compare schema snapshots (diff)
- Export to JSON/YAML for auditing or CI
- Full pipeline in one command
Performance & Cost Savings
Datatrack’s parallel and batched snapshot engine delivers significant performance improvements for real-world databases. Benchmarks were run in August 2025 on a MacBook Pro M2, Python 3.11, using SQLite and PostgreSQL.
| Database Size | Tables | Serial Time | Parallel Time | Speedup | Time Saved (per 1k runs) | Time Saved (per 50k runs) |
|---|---|---|---|---|---|---|
| Small | 12 | 0.18 s | 0.09 s | 2× | 90 s | 75 min |
| Medium | 75 | 0.95 s | 0.32 s | 3× | 630 s (10.5 min) | 8.75 hrs |
| Large | 250 | 2.80 s | 0.80 s | 3.5× | 2,000 s (~33 min) | 27 hrs |
Key Takeaways
- Snapshot time reduced by 65–75% for medium and large databases.
- Scales linearly: higher workloads → greater savings.
- Faster developer feedback: reduced CI/CD wait times, fewer timeouts.
- Lower infrastructure costs: less CPU time means direct savings on cloud compute.
Real-World Impact
For a team running 50,000 large snapshots/month, Datatrack saves ~27 hours of CPU time. At typical cloud compute rates, this translates into hundreds of dollars per year in savings. The bigger win, however, is developer productivity and reliability: faster pipelines, earlier error detection, and less risk of schema-related outages.
Installation
Option 1: Install from PyPI (production use)
pip install datatrack-core
This is the easiest and recommended way to use datatracker as a CLI tool in your workflows.
Option 2: Install from GitHub (for development)
git clone https://github.com/nrnavaneet/datatrack.git
cd datatrack
pip install -r requirements.txt
pip install -e .
This method is ideal if you want to contribute or modify the tool.
Helpful Commands
Datatrack comes with built-in help and guidance for every command. Use this to quickly learn syntax and options:
datatrack --help
or
datatrack -h
How to Use
1. Initialize Tracking
datatrack init
Creates .datatrack/, .databases/, and optional initial files.
2. Connect to a Database
Save your DB connection for future use:
MySQL
datatrack connect mysql+pymysql://root:<password>@localhost:3306/<database-name>
PostgreSQL
datatrack connect postgresql+psycopg2://postgres:<password>@localhost:5432/<database-name>
SQLite
datatrack connect sqlite:///.databases/<database-name>
3. Take a Schema Snapshot
datatrack snapshot
Saves the current schema to .databases/exports/<db_name>/snapshots/.
4. Lint the Schema
datatrack lint
Detects issues in naming and structure.
5. Verify Schema Rules
datatrack verify
Validates schema against schema_rules.yaml.
6. View Schema Differences
datatrack diff
Shows table and column changes between the latest two snapshots.
7. Export Snapshots or Diffs
Export latest snapshot as YAML (default)
datatrack export
Explicitly export snapshot as YAML
datatrack export --type snapshot --format yaml
Export latest diff as JSON
datatrack export --type diff --format json
Output is saved in .databases/exports/<db_name>/.
8. View Snapshot History
datatrack history
Displays all snapshot timestamps and table counts.
9. Run the Full Pipeline
datatrack pipeline run
Runs lint, snapshot, verify, diff, and export together.
For advanced use cases and integration into CI/CD, visit:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datatrack_core-1.1.5.tar.gz.
File metadata
- Download URL: datatrack_core-1.1.5.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e804040c65ac4cff67185030fbc4a430dfda18849010e39fcc4f42a42160b9f4
|
|
| MD5 |
ec8f3038300aef8a29c6abc691923a81
|
|
| BLAKE2b-256 |
9515336d7f77bb7a880eaa5f94a4f1f260ab3e531022dd028a31ef0b207ca136
|
File details
Details for the file datatrack_core-1.1.5-py3-none-any.whl.
File metadata
- Download URL: datatrack_core-1.1.5-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25e0fa0259b0f82237b8ba4c054852b879bed279613ce1b6dcbb38debb931e5f
|
|
| MD5 |
d6da2984cc771b34106e8f225fef80a8
|
|
| BLAKE2b-256 |
8818ca3811f73565288591d5ac914bf69d803e145e261dbcd2bddb3d20f4ca44
|