NVIDIA TAO Dataset Annotation Format Toolkit - A comprehensive toolkit for organizing, validating, and loading annotated datasets for computer vision and vision-language models

These details have not been verified by PyPI

Project links

Project description

NVIDIA TAO DAFT: Dataset Annotation Format Toolkit

NVIDIA TAO DAFT

A toolkit for vision-language dataset formats — JSON-schema specs plus CLI/Python tools to validate and convert between them.

Overview

VLM dataset workflows have a contract-drift problem. Annotation pipelines emit data in one shape, training pipelines expect another, and the glue between them is ad-hoc adapter code that silently goes stale. Field renames, new optional values, schema-vs-data mismatches — these surface as training-time bugs rather than at the producer / consumer boundary where they belong.

What DAFT is:

Schemas for vision-language dataset shapes — both annotation (what producers emit) and training (what consumers expect).
A CLI + validator so anyone holding a dataset can check it against its schema before handing it off.
Converters between annotation and training shapes — explicit, deterministic, with optional flags for media handling.
A reference Python adapter that plugs one of the training shapes into cosmos-rl SFT.

New formats, validators, converters, and adapters are welcome; the same registration pattern that wires the built-ins works for your own extensions.

Value, by audience:

For…	DAFT gives you…
Producers (annotation pipelines, human annotators)	Target one of these schemas and your output is consumable by any downstream tool that speaks the same schema.
Consumers (training pipelines, researchers)	Validate your input dataset before launching a training run. If it passes, your loader contract holds.

Quick start

# Install (direct from git)
pip install git+https://gitlab-master.nvidia.com/nvidia-tao-toolkit/experimental/nvidia-tao-daft.git

# Install (from wheel)
pip install nvidia-tao-daft

# Verify
tao-daft --help

For runnable examples, see examples/ and the CLI reference.

Documentation

Area	What's there	Link
Formats	Format registry, per-format specs (metropolis-v3.0, cosmos-reason-v1.0, tao-vl-reason-v1.0), versioning policy	formats
CLI	`tao-daft validate` / `convert` reference	cli
Validators	Validation engine	validators
Converters	Conversion pairs and pair-specific options	converters
Datasets	Training-loop adapters (cosmos-rl)	datasets
Examples	Working datasets per format	examples

Repository structure

nvidia-tao-daft/
├── examples/datasets/        # Working datasets, one subdir per format
│
├── tests/                    # Test suite (schemas, validators, converters, CLI, doc consistency)
│
└── src/nvidia_tao_daft/
    ├── cli/                  # tao-daft entry point (validate, convert)
    ├── formats/              # Format specifications + JSON schemas
    ├── validators/           # Validation engine
    ├── converters/           # Format converters (pairs/)
    └── datasets/             # Training-loop adapters

Requirements

Python 3.10 – 3.13. Runtime dependencies: jsonschema, pydantic. Dev dependencies: see pyproject.toml.

Contributing

See CONTRIBUTING.md for the DCO sign-off requirement.

License

Apache 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

7.0.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nvidia_tao_daft-7.0.0-py3-none-any.whl (202.1 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file nvidia_tao_daft-7.0.0-py3-none-any.whl.

File metadata

Download URL: nvidia_tao_daft-7.0.0-py3-none-any.whl
Upload date: May 27, 2026
Size: 202.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for nvidia_tao_daft-7.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7fbac0a1d3ff398df3a846f0791ba7fc127f8bcd089964d1a445ad985c1766c1`
MD5	`208e6196d84109adcf604e3c6a3ca9a9`
BLAKE2b-256	`99409c94550b0a3d5017b9f312c1227a211c34263ebab9a2333a60734c7a42e8`

See more details on using hashes here.

nvidia-tao-daft 7.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NVIDIA TAO DAFT: Dataset Annotation Format Toolkit

Overview

Quick start

Documentation

Repository structure

Requirements

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes