DataEval Workflows container for data evaluation
Project description
DataEval Workflows
Workflow orchestration for DataEval with GPU support.
Quick Start
# 1. Build CUDA 11.8 container
docker build -f docker/Dockerfile.cu118 -t dataeval:cu118 .
# 2. Show help
docker run dataeval:cu118
# 3. Run with data and output
docker run --gpus all \
--mount type=bind,source=/path/to/data,target=/dataeval,readonly \
--mount type=bind,source=/path/to/output,target=/output \
dataeval:cu118
Requirements
| Requirement | Version |
|---|---|
| Docker | >= 20.10 |
| NVIDIA GPU | Any (for GPU mode) |
| NVIDIA Driver | >= 520 (for GPU mode) |
| CUDA | 11.8.0 (for GPU mode) |
Verify GPU Access
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Volume Mounts
| Path | Mode | Purpose |
|---|---|---|
/dataeval |
ro | Data directory — datasets, models, configs (required) |
/output |
rw | Results (required) |
/cache |
rw | Computation cache (optional) |
File Permissions
The container runs as a non-root user (dataeval, UID 1000). Mounted directories for /output and /cache must be writable by the container process. There are two approaches:
Option 1: Pass your host UID (recommended)
Use --user to run the container as your host user, so mounted directories are naturally writable:
docker run --gpus all \
--user "$(id -u):$(id -g)" \
--mount type=bind,source=/path/to/data,target=/dataeval,readonly \
--mount type=bind,source=/path/to/output,target=/output \
dataeval:cu118
Option 2: Open directory permissions
Make the output and cache directories world-writable on the host:
chmod 777 /path/to/output /path/to/cache
Then run without --user. This is simpler but less secure.
Custom Data Root
The data root path can be overridden via the DATAEVAL_DATA environment variable:
docker run --gpus all \
-e DATAEVAL_DATA=/data \
--mount type=bind,source=/path/to/data,target=/data,readonly \
--mount type=bind,source=/path/to/output,target=/output \
dataeval:cu118
Configuration
Config files (YAML or JSON) can be placed anywhere in your data directory. By default, all YAML/JSON files at the root of the data mount are auto-discovered and merged.
To specify a config path explicitly:
# Config folder within data directory
docker run --gpus all \
--mount type=bind,source=/path/to/data,target=/dataeval,readonly \
--mount type=bind,source=/path/to/output,target=/output \
dataeval:cu118 --config config/
# Single config file
docker run --gpus all \
--mount type=bind,source=/path/to/data,target=/dataeval,readonly \
--mount type=bind,source=/path/to/output,target=/output \
dataeval:cu118 --config params.yaml
Dataset and model paths in config files are resolved relative to the data root (/dataeval by default).
Dataset Formats
Currently supported dataset structures:
| Format | Structure | Example |
|---|---|---|
| Dataset | Single split, used directly | cifar10_test/ |
| DatasetDict | Multiple splits (dict), configured via config YAML | cifar10_full/ |
CPU Fallback
For machines without NVIDIA GPU:
docker build -f docker/Dockerfile.cpu -t dataeval:cpu .
docker run dataeval:cpu # Shows help
docker run \
--mount type=bind,source=/path/to/data,target=/dataeval,readonly \
--mount type=bind,source=/path/to/output,target=/output \
dataeval:cpu
CLI Modes
DataEval Flow has three modes:
| Command | Purpose |
|---|---|
dataeval-flow [opts] |
Headless execution — for automation and CI/CD pipelines |
dataeval-flow app |
Interactive TUI dashboard — configure, execute, and view results |
dataeval-flow config |
Simple CLI config builder — create/edit configs without the TUI |
Interactive TUI (app)
Installation:
uv sync --extra app # or: pip install dataeval-flow[app]
Usage:
# Launch with a blank config
python -m dataeval_flow app
# Load an existing config for editing
python -m dataeval_flow app --config /path/to/params.yaml
The TUI provides a three-pane dashboard for config editing, task execution, and result viewing. It auto-discovers available torchvision transforms, dataeval selection classes, and workflow types, generating dynamic parameter forms from their schemas.
Simple CLI Config Builder (config)
For environments without the TUI dependency:
python -m dataeval_flow config
python -m dataeval_flow config --config /path/to/params.yaml
Configs can be saved as YAML or JSON.
Dependencies
dataeval- Core evaluation librarydatasets- Huggingface librarymaite-datasets- MAITE protocol adaptermaite- MAITE protocol librarypydantic- Structural typing and schema validation
Troubleshooting
Build appears stuck at uv sync
The Docker build may appear frozen during the uv sync step:
=> [builder 7/7] RUN uv sync --frozen --no-dev --no-install-project 1139.3s
This is normal. The step downloads ~2GB of dependencies (PyTorch, scipy, etc.) with no progress indicator.
| Network Speed | Expected Build Time |
|---|---|
| 100 Mbps | ~10 minutes |
| 30 Mbps | ~20 minutes |
| 10 Mbps | ~45 minutes |
Tip: First build is slow; subsequent builds use Docker cache and complete in seconds.
Running Without Container
The dataeval_flow package can be used standalone without Docker.
Installation:
git clone https://gitlab.jatic.net/jatic/aria/dataeval-flow.git
cd dataeval-flow
uv sync
CLI Usage:
python -m dataeval_flow --config /path/to/config --output /path/to/output
python -m dataeval_flow --data /path/to/data --output /path/to/output
Python API Usage:
from pathlib import Path
from dataeval_flow import load_config, run_tasks
config = load_config(Path("/path/to/data/config.yaml"))
results = run_tasks(config, data_dir=Path("/path/to/data"))
print(results[0].report())
Development:
uv sync --group dev
nox
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataeval_flow-0.1.0.tar.gz.
File metadata
- Download URL: dataeval_flow-0.1.0.tar.gz
- Upload date:
- Size: 161.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
431ce2b43d025772d096815afe126539edc27f5d67a03a3529d89f8bd30a3181
|
|
| MD5 |
78e6e0d0ab928a0bdb075a5fd84c53d7
|
|
| BLAKE2b-256 |
9ba6de8aa9418050da672d05a62873a3230de41b1ede2ec5e0da152b18e4b4ab
|
File details
Details for the file dataeval_flow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dataeval_flow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 205.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e5f35da5609f340988ee188610ae7fb9ef6ec90b13a6ec0f2a8df705d7d1fe7
|
|
| MD5 |
d220c600e3795501584a71daf4808576
|
|
| BLAKE2b-256 |
9c92b534f56bfaff7f9e2c892d32b26f7acc84f99f365c03c94084eb499f6078
|