Skip to main content

DataEval Workflows container for data evaluation

Project description

DataEval Workflows

Workflow orchestration for DataEval with GPU support.

Quick Start

# 1. Build CUDA 11.8 container
docker build -f docker/Dockerfile.cu118 -t dataeval:cu118 .

# 2. Show help
docker run dataeval:cu118

# 3. Run with data and output
docker run --gpus all \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118

Requirements

Requirement Version
Docker >= 20.10
NVIDIA GPU Any (for GPU mode)
NVIDIA Driver >= 520 (for GPU mode)
CUDA 11.8.0 (for GPU mode)

Verify GPU Access

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Volume Mounts

Path Mode Purpose
/dataeval ro Data directory — datasets, models, configs (required)
/output rw Results (required)
/cache rw Computation cache (optional)

File Permissions

The container runs as a non-root user (dataeval, UID 1000). Mounted directories for /output and /cache must be writable by the container process. There are two approaches:

Option 1: Pass your host UID (recommended)

Use --user to run the container as your host user, so mounted directories are naturally writable:

docker run --gpus all \
  --user "$(id -u):$(id -g)" \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118

Option 2: Open directory permissions

Make the output and cache directories world-writable on the host:

chmod 777 /path/to/output /path/to/cache

Then run without --user. This is simpler but less secure.

Custom Data Root

The data root path can be overridden via the DATAEVAL_DATA environment variable:

docker run --gpus all \
  -e DATAEVAL_DATA=/data \
  --mount type=bind,source=/path/to/data,target=/data,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118

Configuration

Config files (YAML or JSON) can be placed anywhere in your data directory. By default, all YAML/JSON files at the root of the data mount are auto-discovered and merged.

To specify a config path explicitly:

# Config folder within data directory
docker run --gpus all \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118 --config config/

# Single config file
docker run --gpus all \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118 --config params.yaml

Dataset and model paths in config files are resolved relative to the data root (/dataeval by default).

Dataset Formats

Currently supported dataset structures:

Format Structure Example
Dataset Single split, used directly cifar10_test/
DatasetDict Multiple splits (dict), configured via config YAML cifar10_full/

CPU Fallback

For machines without NVIDIA GPU:

docker build -f docker/Dockerfile.cpu -t dataeval:cpu .
docker run dataeval:cpu  # Shows help
docker run \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cpu

CLI Modes

DataEval Flow has three modes:

Command Purpose
dataeval-flow [opts] Headless execution — for automation and CI/CD pipelines
dataeval-flow app Interactive TUI dashboard — configure, execute, and view results
dataeval-flow config Simple CLI config builder — create/edit configs without the TUI

Interactive TUI (app)

Installation:

uv sync --extra app          # or: pip install dataeval-flow[app]

Usage:

# Launch with a blank config
python -m dataeval_flow app

# Load an existing config for editing
python -m dataeval_flow app --config /path/to/params.yaml

The TUI provides a three-pane dashboard for config editing, task execution, and result viewing. It auto-discovers available torchvision transforms, dataeval selection classes, and workflow types, generating dynamic parameter forms from their schemas.

Simple CLI Config Builder (config)

For environments without the TUI dependency:

python -m dataeval_flow config
python -m dataeval_flow config --config /path/to/params.yaml

Configs can be saved as YAML or JSON.

Dependencies

  • dataeval - Core evaluation library
  • datasets - Huggingface library
  • maite-datasets - MAITE protocol adapter
  • maite - MAITE protocol library
  • pydantic - Structural typing and schema validation

Troubleshooting

Build appears stuck at uv sync

The Docker build may appear frozen during the uv sync step:

=> [builder 7/7] RUN uv sync --frozen --no-dev --no-install-project    1139.3s

This is normal. The step downloads ~2GB of dependencies (PyTorch, scipy, etc.) with no progress indicator.

Network Speed Expected Build Time
100 Mbps ~10 minutes
30 Mbps ~20 minutes
10 Mbps ~45 minutes

Tip: First build is slow; subsequent builds use Docker cache and complete in seconds.

Running Without Container

The dataeval_flow package can be used standalone without Docker.

Installation:

git clone https://gitlab.jatic.net/jatic/aria/dataeval-flow.git
cd dataeval-flow
uv sync

CLI Usage:

python -m dataeval_flow --config /path/to/config --output /path/to/output
python -m dataeval_flow --data /path/to/data --output /path/to/output

Python API Usage:

from pathlib import Path
from dataeval_flow import load_config, run_tasks

config = load_config(Path("/path/to/data/config.yaml"))
results = run_tasks(config, data_dir=Path("/path/to/data"))
print(results[0].report())

Development:

uv sync --group dev
nox

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataeval_flow-0.1.0.tar.gz (161.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataeval_flow-0.1.0-py3-none-any.whl (205.2 kB view details)

Uploaded Python 3

File details

Details for the file dataeval_flow-0.1.0.tar.gz.

File metadata

  • Download URL: dataeval_flow-0.1.0.tar.gz
  • Upload date:
  • Size: 161.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataeval_flow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 431ce2b43d025772d096815afe126539edc27f5d67a03a3529d89f8bd30a3181
MD5 78e6e0d0ab928a0bdb075a5fd84c53d7
BLAKE2b-256 9ba6de8aa9418050da672d05a62873a3230de41b1ede2ec5e0da152b18e4b4ab

See more details on using hashes here.

File details

Details for the file dataeval_flow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dataeval_flow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 205.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataeval_flow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e5f35da5609f340988ee188610ae7fb9ef6ec90b13a6ec0f2a8df705d7d1fe7
MD5 d220c600e3795501584a71daf4808576
BLAKE2b-256 9c92b534f56bfaff7f9e2c892d32b26f7acc84f99f365c03c94084eb499f6078

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page