Skip to main content

A computer vision dataset processing library

Project description

DataFlow-CV

Where Vibe Coding meets CV data. 🌊 Convert & visualize datasets. Built with the flow of Claude Code.

Python 3.8+ License PyPI Development Status GitHub Actions Linux Windows macOS

A computer vision dataset processing library for seamless format conversion and visualization between LabelMe, COCO, and YOLO annotation formats. Designed for researchers and developers working with multi-format annotation pipelines.

Features

  • Bidirectional Conversion: Convert between LabelMe, COCO, and YOLO formats in any direction
  • Multi-format Support: Handle object detection bounding boxes and instance segmentation polygons
  • Lossless Round-trip: Preserve original coordinates through conversion chains
  • Visualization: Visualize annotations with OpenCV, supporting both display and save modes
  • Command-line Interface: User-friendly CLI with convert and visualize subcommands
  • Python API: Programmatic access for integration into larger pipelines
  • Verbose Logging: Detailed logging with file output for debugging
  • Cross-platform: Full support for Windows, Linux, and macOS

Table of Contents

Installation

From PyPI

pip install dataflow-cv

From Source

# Clone the repository
git clone https://github.com/zjykzj/DataFlow-CV.git
cd DataFlow-CV

# Regular installation
pip install .

# Editable installation (for development)
pip install -e .

Note: When installed in editable mode, use python -m dataflow.cli instead of the dataflow-cv command.

Optional Dependencies

  • pycocotools: Required for COCO RLE segmentation support
    pip install pycocotools
    

Quick Start

Command-line Interface

All required parameters (image directories, label directories, class files, output paths) are positional arguments for better usability. Use --help on any subcommand for detailed usage.

Format Conversion

# YOLO to COCO
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json

# With RLE encoding
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json --do-rle

# YOLO to LabelMe
dataflow-cv convert yolo2labelme images/ yolo_labels/ classes.txt labelme_json/

# COCO to YOLO
dataflow-cv convert coco2yolo coco_annotations.json yolo_labels/

# COCO to LabelMe
dataflow-cv convert coco2labelme coco_annotations.json labelme_json/

# LabelMe to YOLO
dataflow-cv convert labelme2yolo labelme_json/ classes.txt yolo_labels/

# LabelMe to COCO
dataflow-cv convert labelme2coco labelme_json/ classes.txt coco_annotations.json

# With RLE encoding
dataflow-cv convert labelme2coco labelme_json/ classes.txt coco_annotations.json --do-rle

# Enable verbose logging
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json --verbose

Visualization

# Visualize YOLO annotations
dataflow-cv visualize yolo images/ yolo_labels/ classes.txt --save visualized/

# Visualize COCO annotations
dataflow-cv visualize coco images/ coco_annotations.json --save visualized/

# Visualize LabelMe annotations
dataflow-cv visualize labelme images/ labelme_json/ --save visualized/

# Enable verbose logging for detailed debug output
dataflow-cv visualize yolo --verbose images/ yolo_labels/ classes.txt --save visualized/

Python API

from dataflow.convert import YoloAndCocoConverter
from dataflow.visualize import YOLOVisualizer

# Convert YOLO to COCO
converter = YoloAndCocoConverter(source_to_target=True, verbose=True, strict_mode=True)
result = converter.convert(
    source_path="yolo_labels/",
    target_path="coco_annotations.json",
    class_file="classes.txt",
    image_dir="images/",
    do_rle=False  # Set to True for RLE encoding
)

# Visualize YOLO annotations
visualizer = YOLOVisualizer(
    label_dir="yolo_labels/",
    image_dir="images/",
    class_file="classes.txt",
    is_show=True,
    is_save=True,
    output_dir="visualized/",
    verbose=True,
    strict_mode=True
)
result = visualizer.visualize()

See the samples/ directory for complete examples:

  • samples/visualize/yolo_demo.py - YOLO visualization example
  • samples/visualize/labelme_demo.py - LabelMe visualization example
  • samples/visualize/coco_demo.py - COCO visualization example
  • samples/convert/ - Conversion examples

Documentation

  • CLAUDE.md: Detailed architecture and development guide
  • docs/formats/: Format specifications (YOLO, COCO, LabelMe)
  • docs/specs/: Module specifications and design documents
  • CHANGELOG.md: Version history and breaking changes

Key Concepts

  • Normalized Coordinates: All internal coordinates are in 0-1 range
  • Original Data Preservation: Lossless round-trip conversion through OriginalData system
  • Strict Mode: Validation errors raise exceptions (default: enabled in CLI, can be disabled via strict_mode=False parameter in Python API)
  • Verbose Logging: Detailed debug logs saved to files when --verbose is used. The CLI prints "Verbose log saved to:
  • Keyboard Shortcuts: During visualization, press q or ESC to exit early; any other key continues
  • Missing Image Handling: Missing images are skipped with warnings, allowing processing to continue
  • RLE Mask Visualization: COCO RLE masks are displayed with semi-transparent fills for better visibility
  • Color Management: Each class ID gets a unique color from a palette of 1000 distinct colors for consistent visualization

Development

For detailed developer guidance including advanced test commands, debugging, and architecture overview, see CLAUDE.md.

Testing

# Run all tests
pytest

# Run tests with coverage
pytest --cov=dataflow

# Run specific test module
pytest tests/convert/test_yolo_and_coco.py

Linting and Formatting

# Install development dependencies
pip install -e .[dev]

# Format code
black dataflow tests samples

# Sort imports
isort dataflow tests samples

# Type checking
mypy dataflow

# Linting
flake8 dataflow tests samples

Project Structure

dataflow/
├── label/           # Annotation handlers (YOLO, LabelMe, COCO)
├── convert/         # Format converters
├── visualize/       # Visualization modules
├── util/           # Utilities (logging, file operations)
└── cli/            # Command-line interface
tests/              # Comprehensive test suite
samples/            # Usage examples
assets/             # Sample data for testing

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Before contributing, review CLAUDE.md for architecture and development patterns.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add or update tests as needed
  5. Ensure code passes formatting and linting checks
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Thanks to the creators of LabelMe, COCO, and YOLO formats for establishing these annotation standards
  • Built with OpenCV, NumPy, and Click
  • Inspired by the need for seamless format conversion in multi-tool CV pipelines

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflow_cv-0.6.2.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataflow_cv-0.6.2-py3-none-any.whl (74.1 kB view details)

Uploaded Python 3

File details

Details for the file dataflow_cv-0.6.2.tar.gz.

File metadata

  • Download URL: dataflow_cv-0.6.2.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataflow_cv-0.6.2.tar.gz
Algorithm Hash digest
SHA256 3b088a4907c4179794f2a4631c876fac8e2131a1f3fac110072fdfb95e015f4f
MD5 4493502ef6c8a82f2c0868644e60bc64
BLAKE2b-256 5e16fc53ac902837a3a94e3612d807bc0e5fd54509275a69c784f1ebdc7d0980

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataflow_cv-0.6.2.tar.gz:

Publisher: python-publish.yml on zjykzj/DataFlow-CV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataflow_cv-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: dataflow_cv-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 74.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataflow_cv-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 022e1ab1e6021c1dd02d8f0061494677b235117fc5ee3009e118c4d3651199de
MD5 dc9a5b6b9852c2689d9ea7b962e2938d
BLAKE2b-256 02712270c3602f4c612e083dbe4f2fcdb1b05566228536e62ca5f977203a5615

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataflow_cv-0.6.2-py3-none-any.whl:

Publisher: python-publish.yml on zjykzj/DataFlow-CV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page