Skip to main content

Content database library and CLI for VisData 3

Project description

VD3Storage

Content database library and CLI for VisData 3. Manages video and imageset assets, annotations, and worksets backed by MP4/JSON media and CSV-based metadata, with DVC-managed remote storage.

Installation

uv sync

To use as a dependency:

# pyproject.toml
[project]
dependencies = ["vd3"]

Quick Start

# Initialize a content database in the current directory
vd3 init

# ...or in a specific directory
vd3 init /path/to/mydb

# Add a video under a datasource
vd3 add video clip.mp4 -d my-datasource -p /path/to/mydb

# Add multiple videos with a glob (quote to prevent shell expansion)
vd3 add video '*.mp4' -d my-datasource -p /path/to/mydb

# List assets
vd3 list assets -p /path/to/mydb

# Show media availability
vd3 media status -p /path/to/mydb

Core Concepts

  • Datasource — groups assets by origin (e.g. dashcam-2024, test-data). Required when importing.
  • Asset — a single video (MP4 + JSON metadata) or imageset (directory of images).
  • Workset — a named subset of assets, optionally organized into packages (folders). Independent of storage layout.
  • Annotation layer — detections or tracks attached to an asset, with a key (e.g. gt, det/yolo-v8) and a human or machine source.

Adding Assets

Videos

# Single file
vd3 add video clip.mp4 -d dashcam

# Glob (recursive)
vd3 add video 'rawdata/**/*.mp4' -d dashcam

# Force re-import of a duplicate (matched by SHA-256)
vd3 add video clip.mp4 -d dashcam --force

# Add and assign to a workset/package
vd3 add video clip.mp4 -d dashcam -w my-workset -k batch1

Imagesets

# Directory of images
vd3 add imageset /path/to/images -d my-datasource

# Tar archive
vd3 add imageset images.tar -d my-datasource

Annotation results

VD3 JSON detections/tracks into an existing asset:

vd3 add result results.json -a clip -p /path/to/mydb

COCO

Import COCO annotations into an existing imageset:

vd3 add coco annotations.json -a my-imageset \
    --layer gt --source human --reviewed-all

Import a full COCO dataset (creates the imageset and imports annotations):

vd3 add coco-dataset annotations.json -d my-datasource \
    --image-root /path/to/images --layer gt

Worksets

# Create
vd3 workset create "My Experiment"

# Add assets by name or ID
vd3 workset add-asset my-experiment clip-001 clip-002

# ...or by media-path glob (run from the database root; files must be on disk)
cd /path/to/mydb
vd3 workset add-asset my-experiment 'db/media/videos/fc/*.mp4'

# Inspect
vd3 workset list
vd3 workset show my-experiment

# Remove an asset / delete the workset
vd3 workset remove-asset my-experiment clip-001
vd3 workset delete my-experiment

Remote Storage

Media files are tracked by DVC. A content database has a single configured remote.

# Set the remote (replaces any existing one)
vd3 media remote set gs://my-bucket/vd3-data
vd3 media remote show

# Sync
vd3 media push
vd3 media pull
vd3 media status

Supported backends:

Backend URL form Notes
Google Cloud Storage gs://bucket/path gcloud auth application-default login
Amazon S3 s3://bucket/path Standard AWS credential chain
Azure Blob Storage azure://container/path
Google Drive gdrive://folder-id via dvc-gdrive
Local / NAS /mnt/nas/vd3-backup

Listing & Inspection

vd3 list assets             # all assets (filterable)
vd3 list datasources        # all datasources
vd3 list layers -a clip     # annotation layers on an asset
vd3 show clip               # asset details
vd3 info                    # database overview
vd3 query "SELECT ..."      # raw DuckDB SQL against the CSV tables

Exporting

# Extract frames from a video or images from an imageset
vd3 export frames clip -o ./out

Library API

The CLI is a thin wrapper around VD3Storage, which is also usable directly.

from vd3storage import VD3Storage

# Open an existing database (or use VD3Storage.init(path) to create one)
storage = VD3Storage("/path/to/mydb")

# Browse assets
for a in storage.list_assets(datasource="dashcam"):
    print(f"{a.name} ({a.asset_type}): {a.frame_count} frames @ {a.nominal_fps} fps")

# Look up by (datasource, name) or by ID
clip = storage.get_asset("dashcam", "clip-001")
clip = storage.get_asset_by_id("3f1a...")

# Import a video
asset = storage.import_video("clip.mp4", datasource="dashcam")

# Resolve where the media file lives on disk
storage.resolve_media_path(clip)

# Annotation layers
storage.list_annotation_layers(clip.asset_id)
storage.read_annotation_layer(clip.asset_id, "gt")

# Worksets
ws = storage.create_workset("My Experiment")
storage.add_asset_to_workset(ws.workset_id, clip.asset_id, package="batch1")
storage.list_workset_assets(ws.workset_id)

# Raw DuckDB SQL against the underlying CSV tables
rows = storage.execute_sql("SELECT name, frame_count FROM assets WHERE asset_type = 'video'")

Other useful methods: import_imageset, import_coco, import_coco_dataset, import_result, export_coco, open_video, open_imageset, get_frame_image, add_tag, is_media_available, pull, push. Inspect help(VD3Storage) for the full surface.

CLI Reference

vd3 --help              Top-level help
vd3 <command> --help    Help for a specific command
Command Description
init Initialize a content database (defaults to cwd)
info Show database overview
show Show asset details
query Run raw DuckDB SQL against the CSV tables
remove Delete an asset
add video Import video files
add imageset Import an imageset (directory or tar)
add result Import VD3 JSON detections/tracks
add coco Import COCO annotations into an existing imageset
add coco-dataset Import a COCO dataset (imageset + annotations)
list assets List assets
list datasources List datasources
list layers List annotation layers for an asset
workset create Create a workset
workset list List worksets
workset show Show workset details
workset add-asset Add assets to a workset
workset remove-asset Remove an asset from a workset
workset delete Delete a workset (assets are kept)
media status Show media availability
media push Push media to remote storage
media pull Pull media from remote storage
media remote set Set the remote storage URL
media remote show Show the configured remote
export frames Extract frames from a video or imageset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vd3-0.2.0.tar.gz (218.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vd3-0.2.0-py3-none-any.whl (66.7 kB view details)

Uploaded Python 3

File details

Details for the file vd3-0.2.0.tar.gz.

File metadata

  • Download URL: vd3-0.2.0.tar.gz
  • Upload date:
  • Size: 218.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for vd3-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1fa006249f5824da87e52e4699617ca9be038f4f45d10426e9c8d5c955282070
MD5 dab9a6751ac59b524d6f19ad43a61cf2
BLAKE2b-256 c5be52b44e2c808d2882d41605991f7376cbda45227349ac7ad0d903ed3dc9a7

See more details on using hashes here.

File details

Details for the file vd3-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vd3-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 66.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for vd3-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c9fc7a2e823c9eb38d29d70da6db045a7620009140f6944f4cad7e8626c82922
MD5 6f106edc34d1b99337fd788ff38a4787
BLAKE2b-256 b4bf0d85bca13b4af2d9c3836971e3e86a31e474075f36f9a89a94ead4b5ae3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page