Skip to main content

A data processing library for computer vision datasets

Project description

DataFlow-CV

Where Vibe Coding meets CV data. ๐ŸŒŠ Convert & visualize datasets. Built with the flow of Claude Code.

Python Version License PyPI Development Status GitHub Actions

A data processing library for computer vision datasets, focusing on format conversion and visualization between LabelMe, COCO, and YOLO formats. Provides both a CLI and Python API.

Table of Contents

Project Structure

dataflow/
โ”œโ”€โ”€ __init__.py              # Package exports and convenience functions
โ”œโ”€โ”€ cli.py                   # Command-line interface
โ”œโ”€โ”€ config.py                # Configuration management
โ”œโ”€โ”€ convert/                 # Format conversion module
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ base.py             # Converter base class
โ”‚   โ”œโ”€โ”€ coco_and_yolo.py    # COCO โ†” YOLO converters
โ”‚   โ”œโ”€โ”€ coco_and_labelme.py # COCO โ†” LabelMe converters
โ”‚   โ””โ”€โ”€ yolo_and_labelme.py # YOLO โ†” LabelMe converters
โ”œโ”€โ”€ visualize/               # Annotation visualization module
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ base.py            # Visualizer base class
โ”‚   โ”œโ”€โ”€ generic.py         # Generic visualizer base class using label handlers
โ”‚   โ”œโ”€โ”€ yolo.py            # YOLO annotation visualizer
โ”‚   โ”œโ”€โ”€ coco.py            # COCO annotation visualizer
โ”‚   โ””โ”€โ”€ labelme.py         # LabelMe annotation visualizer
โ””โ”€โ”€ label/                   # Label format handlers module
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ yolo.py            # YOLO format handler
    โ”œโ”€โ”€ coco.py            # COCO format handler
    โ””โ”€โ”€ labelme.py         # LabelMe format handler
tests/
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ convert/                # Conversion tests
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ test_coco_to_yolo.py
โ”‚   โ”œโ”€โ”€ test_yolo_to_coco.py
โ”‚   โ”œโ”€โ”€ test_coco_to_labelme.py
โ”‚   โ”œโ”€โ”€ test_labelme_to_coco.py
โ”‚   โ”œโ”€โ”€ test_labelme_to_yolo.py
โ”‚   โ””โ”€โ”€ test_yolo_to_labelme.py
โ”œโ”€โ”€ visualize/              # Visualization tests
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ test_yolo.py
โ”‚   โ”œโ”€โ”€ test_coco.py
โ”‚   โ”œโ”€โ”€ test_labelme.py
โ”‚   โ””โ”€โ”€ test_generic.py    # Generic visualizer tests
โ”œโ”€โ”€ run_tests.py           # Test runner
samples/
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ example_usage.py       # Quick usage demonstration
โ”œโ”€โ”€ template.py            # Example template for creating new examples
โ”œโ”€โ”€ cli/                   # CLI usage examples
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ convert/
โ”‚   โ”‚   โ”œโ”€โ”€ cli_coco_to_yolo.py
โ”‚   โ”‚   โ”œโ”€โ”€ cli_yolo_to_coco.py
โ”‚   โ”‚   โ”œโ”€โ”€ cli_coco_to_labelme.py
โ”‚   โ”‚   โ”œโ”€โ”€ cli_labelme_to_coco.py
โ”‚   โ”‚   โ”œโ”€โ”€ cli_labelme_to_yolo.py
โ”‚   โ”‚   โ””โ”€โ”€ cli_yolo_to_labelme.py
โ”‚   โ””โ”€โ”€ visualize/
โ”‚       โ”œโ”€โ”€ cli_yolo.py
โ”‚       โ”œโ”€โ”€ cli_coco.py
โ”‚       โ””โ”€โ”€ cli_labelme.py
โ””โ”€โ”€ api/                   # Python API examples
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ convert/
    โ”‚   โ”œโ”€โ”€ api_coco_to_yolo.py
    โ”‚   โ”œโ”€โ”€ api_yolo_to_coco.py
    โ”‚   โ”œโ”€โ”€ api_coco_to_labelme.py
    โ”‚   โ”œโ”€โ”€ api_labelme_to_coco.py
    โ”‚   โ”œโ”€โ”€ api_labelme_to_yolo.py
    โ”‚   โ””โ”€โ”€ api_yolo_to_labelme.py
    โ””โ”€โ”€ visualize/
        โ”œโ”€โ”€ api_yolo.py
        โ”œโ”€โ”€ api_coco.py
        โ””โ”€โ”€ api_labelme.py
docs/                       # Data format documentation
โ”œโ”€โ”€ README.md              # Documentation index
โ”œโ”€โ”€ yolo.md                # YOLO format specification
โ”œโ”€โ”€ labelme.md             # LabelMe format specification
โ””โ”€โ”€ coco.md                # COCO format specification

Requirements

Core Dependencies

  • Python 3.8 or higher
  • Linux environment (POSIX compatible, assumes POSIX paths)
  • click >= 8.1.0 โ€“ CLI framework
  • numpy >= 2.0.0 โ€“ numerical operations
  • opencv-python >= 4.8.0 โ€“ image processing (optional, used for some image operations)
  • Pillow >= 10.0.0 โ€“ image reading (optional, used for reading image dimensions)

Quick Start

Installation

# Regular installation from source
pip install .

# Install from PyPI
pip install dataflow-cv

Editable Installation (Development Mode)

Due to setuptools compatibility, use python setup.py develop instead of pip install -e .:

# Editable installation (development mode)
python setup.py develop

# After editable installation, use python -m dataflow.cli instead of the dataflow command
python -m dataflow.cli --help

Build System

The project uses setuptools with a pyproject.toml configuration. Distribution packages are built with python -m build.

# Build wheel and source distribution
python -m build

# Install from built wheel
pip install dist/dataflow_cv-*.whl

Command Line Usage

Global options: --verbose (-v) for progress output, --overwrite to replace existing files.

# COCO to YOLO conversion (use --segmentation for polygon annotations)
dataflow convert coco2yolo annotations.json output_dir/
dataflow convert coco2yolo annotations.json output_dir/ --segmentation

# YOLO to COCO conversion
dataflow convert yolo2coco images/ labels/ classes.names output.json

# COCO to LabelMe conversion (use --segmentation for polygon annotations)
dataflow convert coco2labelme annotations.json output_dir/
dataflow convert coco2labelme annotations.json output_dir/ --segmentation

# LabelMe to COCO conversion
dataflow convert labelme2coco labels/ classes.names output.json

# LabelMe to YOLO conversion (use --segmentation for polygon annotations)
dataflow convert labelme2yolo labels/ output_dir/
dataflow convert labelme2yolo labels/ output_dir/ --segmentation

# YOLO to LabelMe conversion
dataflow convert yolo2labelme images/ labels/ classes.names output_dir/

# Visualize YOLO annotations (use --save to export images)
dataflow visualize yolo images/ labels/ classes.names
dataflow visualize yolo images/ labels/ classes.names --save output_dir/

# Visualize COCO annotations (use --save to export images)
dataflow visualize coco images/ annotations.json
dataflow visualize coco images/ annotations.json --save output_dir/

# Visualize LabelMe annotations (use --save to export images)
dataflow visualize labelme images/ labels/
dataflow visualize labelme images/ labels/ --save output_dir/

# Show configuration
dataflow config

# Get help
dataflow --help
dataflow convert coco2yolo --help
dataflow visualize yolo --help
dataflow visualize labelme --help

See the CLI Reference below for detailed usage.

Python API Usage

import dataflow

# COCO to YOLO conversion (pass segmentation=True for polygon annotations)
result = dataflow.coco_to_yolo("annotations.json", "output_dir")
result = dataflow.coco_to_yolo("annotations.json", "output_dir", segmentation=True)
print(f"Processed {result['images_processed']} images")

# YOLO to COCO conversion
result = dataflow.yolo_to_coco("images/", "labels/", "classes.names", "output.json")
print(f"Generated {result['annotations_processed']} annotations")

# Additional conversions (import converters directly)
from dataflow.convert import (
    CocoToLabelMeConverter,
    LabelMeToCocoConverter,
    LabelMeToYoloConverter,
    YoloToLabelMeConverter
)

# COCO to LabelMe conversion
converter = CocoToLabelMeConverter()
result = converter.convert("annotations.json", "output_dir/", segmentation=True)
print(f"Converted {result['images_processed']} images to LabelMe format")

# LabelMe to COCO conversion
converter = LabelMeToCocoConverter()
result = converter.convert("labels/", "classes.names", "output.json")
print(f"Converted {result['annotations_processed']} annotations to COCO format")

# LabelMe to YOLO conversion
converter = LabelMeToYoloConverter()
result = converter.convert("labels/", "output_dir/")
print(f"Converted {result['images_processed']} images to YOLO format")

# YOLO to LabelMe conversion
converter = YoloToLabelMeConverter()
result = converter.convert("images/", "labels/", "classes.names", "output_dir/")
print(f"Converted {result['images_processed']} images to LabelMe format")

# Visualize YOLO annotations (save_dir is optional)
result = dataflow.visualize_yolo("images/", "labels/", "classes.names")
result = dataflow.visualize_yolo("images/", "labels/", "classes.names", save_dir="output_dir/")
print(f"Visualized {result['images_processed']} images")

# Visualize COCO annotations (save_dir is optional)
result = dataflow.visualize_coco("images/", "annotations.json")
result = dataflow.visualize_coco("images/", "annotations.json", save_dir="output_dir/")
print(f"Visualized {result['images_processed']} images")

# Visualize LabelMe annotations (save_dir is optional)
result = dataflow.visualize_labelme("images/", "labels/")
result = dataflow.visualize_labelme("images/", "labels/", save_dir="output_dir/")
print(f"Visualized {result['images_processed']} images")
print(f"Classes found: {result['classes_found']}")

CLI Reference

The CLI follows a hierarchical structure: dataflow <mainโ€‘task> <subโ€‘task> [arguments]. Global options can be placed before the main task.

Global Options

  • --verbose, -v: Enable verbose output (progress information)
  • --overwrite: Overwrite existing files

Conversion Commands

COCO to YOLO

dataflow convert coco2yolo COCO_JSON_PATH OUTPUT_DIR [--segmentation]
  • COCO_JSON_PATH: Path to COCO JSON annotation file
  • OUTPUT_DIR: Directory where labels/ and class.names will be created
  • --segmentation, -s: Handle segmentation annotations (polygon format)

YOLO to COCO

dataflow convert yolo2coco IMAGE_DIR YOLO_LABELS_DIR YOLO_CLASS_PATH COCO_JSON_PATH
  • IMAGE_DIR: Directory containing image files
  • YOLO_LABELS_DIR: Directory containing YOLO label files (.txt)
  • YOLO_CLASS_PATH: Path to YOLO class names file (e.g., class.names)
  • COCO_JSON_PATH: Path to save COCO JSON file

COCO to LabelMe

dataflow convert coco2labelme COCO_JSON_PATH OUTPUT_DIR [--segmentation]
  • COCO_JSON_PATH: Path to COCO JSON annotation file
  • OUTPUT_DIR: Directory where LabelMe JSON files will be created
  • --segmentation, -s: Handle segmentation annotations (polygon format)

LabelMe to COCO

dataflow convert labelme2coco LABEL_DIR CLASSES_PATH OUTPUT_JSON_PATH [--segmentation]
  • LABEL_DIR: Directory containing LabelMe JSON files
  • CLASSES_PATH: Path to class names file (e.g., class.names)
  • OUTPUT_JSON_PATH: Path to save COCO JSON file
  • --segmentation, -s: Handle segmentation annotations (polygon format)

LabelMe to YOLO

dataflow convert labelme2yolo LABEL_DIR OUTPUT_DIR [--segmentation]
  • LABEL_DIR: Directory containing LabelMe JSON files
  • OUTPUT_DIR: Directory where labels/ and class.names will be created
  • --segmentation, -s: Handle segmentation annotations (polygon format)

YOLO to LabelMe

dataflow convert yolo2labelme IMAGE_DIR LABEL_DIR CLASSES_PATH OUTPUT_DIR [--segmentation]
  • IMAGE_DIR: Directory containing image files
  • LABEL_DIR: Directory containing YOLO label files (.txt)
  • CLASSES_PATH: Path to YOLO class names file (e.g., class.names)
  • OUTPUT_DIR: Directory where LabelMe JSON files will be created
  • --segmentation, -s: Handle segmentation annotations (polygon format)

Visualization Commands

Visualize YOLO annotations

dataflow visualize yolo IMAGE_DIR LABEL_DIR CLASS_PATH [--save SAVE_DIR]
  • IMAGE_DIR: Directory containing image files
  • LABEL_DIR: Directory containing YOLO label files (.txt)
  • CLASS_PATH: Path to class names file (e.g., class.names)
  • --save SAVE_DIR: Optional directory to save visualized images

Visualize COCO annotations

dataflow visualize coco IMAGE_DIR ANNOTATION_JSON [--save SAVE_DIR]
  • IMAGE_DIR: Directory containing image files
  • ANNOTATION_JSON: Path to COCO JSON annotation file
  • --save SAVE_DIR: Optional directory to save visualized images

Visualize LabelMe annotations

dataflow visualize labelme IMAGE_DIR LABEL_DIR [--save SAVE_DIR]
  • IMAGE_DIR: Directory containing image files
  • LABEL_DIR: Directory containing LabelMe JSON files
  • --save SAVE_DIR: Optional directory to save visualized images

Configuration Command

dataflow config

Shows the current configuration (file extensions, default values, CLI context).

Getting Help

dataflow --help
dataflow convert --help
dataflow convert coco2yolo --help
dataflow convert yolo2coco --help
dataflow visualize --help
dataflow visualize yolo --help
dataflow visualize coco --help
dataflow visualize labelme --help

Segmentation Support

DataFlow-CV supports both bounding box and polygon segmentation annotations across all formats:

YOLO Segmentation Format

  • Detection format: class_id x_center y_center width height (normalized coordinates)
  • Segmentation format: class_id x1 y1 x2 y2 ... (polygon vertices, normalized)
  • YOLO segmentation files have the same .txt extension as detection files

COCO Segmentation Format

  • Polygon coordinates in segmentation field (list of [x1, y1, x2, y2, ...])
  • Both single-polygon and multi-polygon annotations are supported

LabelMe Segmentation Format

  • Rectangle shapes (shape_type: "rectangle") for bounding box annotations
  • Polygon shapes (shape_type: "polygon") for segmentation annotations
  • Each JSON file contains shapes array with annotation data

Usage Examples

# Convert COCO to YOLO with segmentation annotations
dataflow convert coco2yolo annotations.json output_dir/ --segmentation

# Visualize YOLO annotations in strict segmentation mode (only polygons)
dataflow visualize yolo images/ labels/ classes.names --segmentation

# Visualize COCO annotations in strict segmentation mode
dataflow visualize coco images/ annotations.json --segmentation

# Visualize LabelMe annotations in strict segmentation mode (only polygons)
dataflow visualize labelme images/ labels/ --segmentation

Python API

# Convert COCO to YOLO with segmentation
result = dataflow.coco_to_yolo("annotations.json", "output_dir", segmentation=True)

# Visualize in strict segmentation mode
result = dataflow.visualize_yolo("images/", "labels/", "classes.names", segmentation=True)
result = dataflow.visualize_labelme("images/", "labels/", segmentation=True)

Notes

  • Without the --segmentation flag, both bounding boxes and polygons are processed automatically
  • With --segmentation flag, only valid polygon annotations are processed (strict mode)
  • YOLO segmentation format requires at least 3 points (6 coordinates)
  • COCO segmentation polygons are automatically converted to YOLO normalized coordinates
  • LabelMe format supports both rectangle (shape_type: "rectangle") and polygon (shape_type: "polygon") shapes
  • In segmentation mode, LabelMe visualizer rejects rectangle shapes and only accepts polygon shapes

Running Tests

# Run all tests
python tests/run_tests.py

# Run specific test
python tests/run_tests.py --test TestCocoToYoloConverter

# With verbose output
python tests/run_tests.py -v

Examples

Check the samples/ directory for detailed usage examples:

  • samples/cli/convert/ - CLI conversion examples
  • samples/cli/visualize/ - CLI visualization examples
  • samples/api/convert/ - Python API conversion examples
  • samples/api/visualize/ - Python API visualization examples

Documentation

Detailed data format specifications are available in the docs/ directory:

These documents describe the annotation formats supported by DataFlow-CV, without covering tool usage.

Development

For development guidelines, architecture details, and contribution instructions, see CLAUDE.md. This file provides guidance for working with the codebase, including common development commands, architectural patterns, and writing principles.

License

MIT License ยฉ 2026 zjykzj

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflow_cv-0.4.0.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataflow_cv-0.4.0-py3-none-any.whl (54.2 kB view details)

Uploaded Python 3

File details

Details for the file dataflow_cv-0.4.0.tar.gz.

File metadata

  • Download URL: dataflow_cv-0.4.0.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataflow_cv-0.4.0.tar.gz
Algorithm Hash digest
SHA256 36869ab8ce3dd8483fcb9f36a863cfc6248305a832ea819d771e1a55bd87ab3b
MD5 ebc75d8a862d3a4dc7a645b8c6058140
BLAKE2b-256 bf9ca9e0b24dc4ccca9dfa83f612f935638fcc02444552479a8643134428fa13

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataflow_cv-0.4.0.tar.gz:

Publisher: python-publish.yml on zjykzj/DataFlow-CV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataflow_cv-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dataflow_cv-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 54.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataflow_cv-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f1ad0edea8a9eaa3556e44e0eea9890a9ce2831bd7ef78380af7520ca429487
MD5 bf72b5657f0d29442794cf3a812cc303
BLAKE2b-256 ad9c112e73046859ec144fa0b09142e0cd61ffaa42e5cba641368809342fea73

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataflow_cv-0.4.0-py3-none-any.whl:

Publisher: python-publish.yml on zjykzj/DataFlow-CV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page