Skip to main content

A computer vision dataset processing library

Project description

DataFlow-CV

Where Vibe Coding meets CV data. 🌊 Convert & visualize datasets. Built with the flow of Claude Code.

Python 3.8+ License PyPI Development Status GitHub Actions Linux Windows macOS

A computer vision dataset processing library for seamless format conversion and visualization between LabelMe, COCO, and YOLO annotation formats. Designed for researchers and developers working with multi-format annotation pipelines.

Features

  • Bidirectional Conversion: Convert between LabelMe, COCO, and YOLO formats in any direction
  • Multi-format Support: Handle object detection bounding boxes and instance segmentation polygons
  • Lossless Round-trip: Preserve original coordinates through conversion chains
  • Visualization: Visualize annotations with OpenCV, supporting both display and save modes
  • Command-line Interface: User-friendly CLI with convert and visualize subcommands
  • Python API: Programmatic access for integration into larger pipelines
  • Verbose Logging: Detailed logging with file output for debugging
  • Cross-platform: Full support for Windows, Linux, and macOS

Table of Contents

Installation

From PyPI

pip install dataflow-cv

From Source

# Clone the repository
git clone https://github.com/zjykzj/DataFlow-CV.git
cd DataFlow-CV

# Regular installation
pip install .

# Editable installation (for development)
pip install -e .

Note: When installed in editable mode, use python -m dataflow.cli instead of the dataflow-cv command.

Optional Dependencies

  • pycocotools: Required for COCO RLE segmentation support
    pip install pycocotools
    

Quick Start

Command-line Interface

All required parameters (image directories, label directories, class files, output paths) are positional arguments for better usability. Use --help on any subcommand for detailed usage.

Format Conversion

# YOLO to COCO
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json

# With RLE encoding
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json --do-rle

# YOLO to LabelMe
dataflow-cv convert yolo2labelme images/ yolo_labels/ classes.txt labelme_json/

# COCO to YOLO
dataflow-cv convert coco2yolo coco_annotations.json yolo_labels/

# COCO to LabelMe
dataflow-cv convert coco2labelme coco_annotations.json labelme_json/

# LabelMe to YOLO
dataflow-cv convert labelme2yolo labelme_json/ classes.txt yolo_labels/

# LabelMe to COCO
dataflow-cv convert labelme2coco labelme_json/ classes.txt coco_annotations.json

# With RLE encoding
dataflow-cv convert labelme2coco labelme_json/ classes.txt coco_annotations.json --do-rle

# Enable verbose logging
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json --verbose

Visualization

# Visualize YOLO annotations
dataflow-cv visualize yolo images/ yolo_labels/ classes.txt --save visualized/

# Visualize COCO annotations
dataflow-cv visualize coco images/ coco_annotations.json --save visualized/

# Visualize LabelMe annotations
dataflow-cv visualize labelme images/ labelme_json/ --save visualized/

# Enable verbose logging for detailed debug output
dataflow-cv visualize yolo --verbose images/ yolo_labels/ classes.txt --save visualized/

Python API

from dataflow.convert import YoloAndCocoConverter
from dataflow.visualize import YOLOVisualizer

# Convert YOLO to COCO
converter = YoloAndCocoConverter(source_to_target=True, verbose=True, strict_mode=True)
result = converter.convert(
    source_path="yolo_labels/",
    target_path="coco_annotations.json",
    class_file="classes.txt",
    image_dir="images/",
    do_rle=False  # Set to True for RLE encoding
)

# Visualize YOLO annotations
visualizer = YOLOVisualizer(
    label_dir="yolo_labels/",
    image_dir="images/",
    class_file="classes.txt",
    is_show=True,
    is_save=True,
    output_dir="visualized/",
    verbose=True,
    strict_mode=True
)
result = visualizer.visualize()

See the samples/ directory for complete examples:

  • samples/visualize/yolo_demo.py - YOLO visualization example
  • samples/visualize/labelme_demo.py - LabelMe visualization example
  • samples/visualize/coco_demo.py - COCO visualization example
  • samples/convert/ - Conversion examples

Documentation

  • CLAUDE.md: Detailed architecture and development guide
  • docs/formats/: Format specifications (YOLO, COCO, LabelMe)
  • docs/specs/: Module specifications and design documents
  • CHANGELOG.md: Version history and breaking changes

Key Concepts

  • Normalized Coordinates: All internal coordinates are in 0-1 range
  • Original Data Preservation: Lossless round-trip conversion through OriginalData system
  • Strict Mode: Validation errors raise exceptions (default: enabled in CLI, can be disabled via strict_mode=False parameter in Python API)
  • Verbose Logging: Detailed debug logs saved to files when --verbose is used
  • Keyboard Shortcuts: During visualization, press q or ESC to exit early; any other key continues
  • Missing Image Handling: Missing images are skipped with warnings, allowing processing to continue
  • RLE Mask Visualization: COCO RLE masks are displayed with semi-transparent fills for better visibility

Development

For detailed developer guidance including advanced test commands, debugging, and architecture overview, see CLAUDE.md.

Testing

# Run all tests
pytest

# Run tests with coverage
pytest --cov=dataflow

# Run specific test module
pytest tests/convert/test_yolo_and_coco.py

Linting and Formatting

# Install development dependencies
pip install -e .[dev]

# Format code
black dataflow tests samples

# Sort imports
isort dataflow tests samples

# Type checking
mypy dataflow

# Linting
flake8 dataflow tests samples

Project Structure

dataflow/
├── label/           # Annotation handlers (YOLO, LabelMe, COCO)
├── convert/         # Format converters
├── visualize/       # Visualization modules
├── util/           # Utilities (logging, file operations)
└── cli/            # Command-line interface
tests/              # Comprehensive test suite
samples/            # Usage examples
assets/             # Sample data for testing

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Before contributing, review CLAUDE.md for architecture and development patterns.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add or update tests as needed
  5. Ensure code passes formatting and linting checks
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Thanks to the creators of LabelMe, COCO, and YOLO formats for establishing these annotation standards
  • Built with OpenCV, NumPy, and Click
  • Inspired by the need for seamless format conversion in multi-tool CV pipelines

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflow_cv-0.6.1.tar.gz (60.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataflow_cv-0.6.1-py3-none-any.whl (73.4 kB view details)

Uploaded Python 3

File details

Details for the file dataflow_cv-0.6.1.tar.gz.

File metadata

  • Download URL: dataflow_cv-0.6.1.tar.gz
  • Upload date:
  • Size: 60.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataflow_cv-0.6.1.tar.gz
Algorithm Hash digest
SHA256 e17b2ccc329248aa9783034bd4226ba61794c9f7fad536ff406a6e8058b607e9
MD5 67ebb6341a784a2f6ce97a027b7c5940
BLAKE2b-256 06ea2159c1abe82863a5a3f7c23889799295008fcf716c7d4043a91bec867c9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataflow_cv-0.6.1.tar.gz:

Publisher: python-publish.yml on zjykzj/DataFlow-CV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataflow_cv-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: dataflow_cv-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 73.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataflow_cv-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f6a071c9f1fed58676c492d307aae4e509e324669e5b0a5e48f44d1e20763e1e
MD5 d53000e0761de73002a4ce180bae19ba
BLAKE2b-256 8f1ab69ca57809ff407cc2ad25bbab779752b8d49c7199defbae69f675d9ac6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataflow_cv-0.6.1-py3-none-any.whl:

Publisher: python-publish.yml on zjykzj/DataFlow-CV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page