Minimal data version control - a lightweight wrapper around DVC
Project description
DVX - Minimal Data Version Control
DVX is a lightweight wrapper around DVC that provides only the core data versioning functionality, without pipelines, experiments, metrics, params, or plots.
Why DVX?
DVC is a powerful tool, but its feature set has grown significantly. If you only need to:
- Track large files with
.dvcfiles - Push/pull data to remote storage (S3, GCS, etc.)
- Version data alongside your code
...then DVX gives you exactly that, with a simpler interface and smaller surface area.
Installation
pip install dvx
# With S3 support
pip install dvx[s3]
# With all remote backends
pip install dvx[all]
Usage
CLI
# Initialize
dvx init
# Track files (parallel-safe, lock-free)
dvx add data/
dvx add model.pkl
dvx add -r output.parquet # auto-add stale deps first
# Configure remote
dvx remote add -d myremote s3://mybucket/dvc
# Push to remote
dvx push
# Pull from remote
dvx pull
# Check status (shows data vs dep freshness)
dvx status
dvx status -v # also show fresh files
dvx status --json # JSON output
dvx status -j4 data/ # parallel checking
Python API
from dvx import Repo
# Initialize
repo = Repo.init()
# Or open existing
with Repo() as repo:
repo.add("data/")
repo.push()
status = repo.status()
diff = repo.diff("HEAD~1")
Commands
DVX exposes these DVC commands:
| Command | Description |
|---|---|
init |
Initialize a DVX/DVC repository |
add |
Track file(s) with DVX |
push |
Upload data to remote storage |
pull |
Download data from remote storage |
fetch |
Download data to cache (no checkout) |
checkout |
Restore data files from cache |
status |
Show freshness of tracked files (data & deps) |
diff |
Show changes between revisions |
gc |
Garbage collect unused cache |
remove |
Stop tracking file(s) |
move |
Move tracked file(s) |
import |
Import from another DVC repo |
import-url |
Import from a URL |
get |
Download without tracking |
get-url |
Download URL without tracking |
config |
Configure settings (delegates to DVC) |
remote |
Manage remotes (delegates to DVC) |
cache |
Manage cache (delegates to DVC) |
What's NOT included
DVX intentionally excludes:
- Pipelines (
dvc.yaml,dvc run,dvc repro,dvc dag) - Experiments (
dvc exp, experiment tracking) - Metrics (
dvc metrics) - Params (
dvc params) - Plots (
dvc plots) - Stages (
dvc stage)
If you need these features, use DVC directly.
Freshness Model
DVX tracks two types of freshness for each artifact:
- Data freshness: Does the actual data match the hash in the
.dvcfile? - Dep freshness: Do recorded dependency hashes match the deps'
.dvcfiles?
This mirrors git's model - each .dvc file declares what it expects, with no transitivity. If a dependency's data differs from its own .dvc file, that's a separate issue for that dependency.
$ dvx status s3/output/
✗ s3/output/result.parquet.dvc (data changed (abc123... vs def456...))
✗ s3/output/summary.json.dvc (dep changed: s3/input/data.parquet)
✓ s3/output/metadata.json.dvc (up-to-date)
Provenance Tracking
When adding an output with deps, DVX ensures accurate provenance:
- Deps must be fresh:
dvx adderrors if any dep's file hash differs from its.dvchash - Recursive add: Use
dvx add -rto auto-add stale deps first (depth-first) - Accurate recording: Recorded dep hashes always match what was actually used
$ dvx add output.parquet
Error: Cannot add output.parquet: 1 stale dep(s):
input.parquet: .dvc=abc123... file=def456...
Run `dvx add` on deps first, or use --recursive
$ dvx add -r output.parquet # adds input.parquet first, then output.parquet
Added input.parquet (def456...)
Added output.parquet (xyz789...)
Performance
DVX is optimized for large repos:
- Mtime caching: Skips hash computation when file mtime unchanged (SQLite-backed)
- Batched git lookups: Uses
git ls-tree -rfor all blob SHAs in one call - Lock-free adds: Parallel-safe cache operations via atomic file writes
- Parallel status: Check many files concurrently with
-j/--jobs
Compatibility
- DVX uses
.dvcfiles - fully compatible with DVC - DVX repos are DVC repos - you can use
dvccommands too - DVC plugins (dvc-s3, dvc-gs, etc.) work with DVX
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dvx-0.1.0.tar.gz.
File metadata
- Download URL: dvx-0.1.0.tar.gz
- Upload date:
- Size: 63.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fc8333b50dd720544b3d3b67ee26de0e957308073c9c416d6dec3c6956d5ce0
|
|
| MD5 |
7b5ea15f9d776cf34e4db9c77e7eb8a9
|
|
| BLAKE2b-256 |
d0199e531f9a9b53c6743babae7aba44bceb0eee8cdf11cdf2b8eafab2ee26ae
|
File details
Details for the file dvx-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dvx-0.1.0-py3-none-any.whl
- Upload date:
- Size: 57.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d92f23aecc2e6bd9be31b99bd12e42af42d8d33be5a3b15d6c840b46fbdb2a69
|
|
| MD5 |
51dfd5b3dbcab1e07296b19aca7caa1d
|
|
| BLAKE2b-256 |
f9be02776e11de3db116f0302bf7c490919ed81ee99aca385b0785e0f806b8a6
|