A computer vision dataset processing library
Project description
DataFlow-CV
Where Vibe Coding meets CV data. 🌊 Convert & visualize datasets. Built with the flow of Claude Code.
A computer vision dataset processing library for seamless format conversion and visualization between LabelMe, COCO, and YOLO annotation formats. Designed for researchers and developers working with multi-format annotation pipelines.
Features
- Bidirectional Conversion: Convert between LabelMe, COCO, and YOLO formats in any direction
- Multi-format Support: Handle object detection bounding boxes and instance segmentation polygons
- Lossless Round-trip: Preserve original coordinates through conversion chains
- Visualization: Visualize annotations with OpenCV, supporting both display and save modes
- Command-line Interface: User-friendly CLI with
convertandvisualizesubcommands - Python API: Programmatic access for integration into larger pipelines
- Verbose Logging: Detailed logging with file output for debugging
- Cross-platform: Full support for Windows, Linux, and macOS
Table of Contents
Installation
From PyPI
pip install dataflow-cv
From Source
# Clone the repository
git clone https://github.com/zjykzj/DataFlow-CV.git
cd DataFlow-CV
# Regular installation
pip install .
# Editable installation (for development)
pip install -e .
Note: When installed in editable mode, use python -m dataflow.cli instead of the dataflow-cv command.
Optional Dependencies
pycocotools: Required for COCO RLE segmentation supportpip install pycocotools
Quick Start
Command-line Interface
All required parameters (image directories, label directories, class files, output paths) are positional arguments for better usability. Use --help on any subcommand for detailed usage.
Format Conversion
# YOLO to COCO
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json
# With RLE encoding
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json --do-rle
# YOLO to LabelMe
dataflow-cv convert yolo2labelme images/ yolo_labels/ classes.txt labelme_json/
# COCO to YOLO
dataflow-cv convert coco2yolo coco_annotations.json yolo_labels/
# COCO to LabelMe
dataflow-cv convert coco2labelme coco_annotations.json labelme_json/
# LabelMe to YOLO
dataflow-cv convert labelme2yolo labelme_json/ classes.txt yolo_labels/
# LabelMe to COCO
dataflow-cv convert labelme2coco labelme_json/ classes.txt coco_annotations.json
# With RLE encoding
dataflow-cv convert labelme2coco labelme_json/ classes.txt coco_annotations.json --do-rle
# Enable verbose logging
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt coco_annotations.json --verbose
Visualization
# Visualize YOLO annotations
dataflow-cv visualize yolo images/ yolo_labels/ classes.txt --save visualized/
# Visualize COCO annotations
dataflow-cv visualize coco images/ coco_annotations.json --save visualized/
# Visualize LabelMe annotations
dataflow-cv visualize labelme images/ labelme_json/ --save visualized/
# Enable verbose logging for detailed debug output
dataflow-cv visualize yolo --verbose images/ yolo_labels/ classes.txt --save visualized/
Python API
from dataflow.convert import YoloAndCocoConverter
from dataflow.visualize import YOLOVisualizer
# Convert YOLO to COCO
converter = YoloAndCocoConverter(source_to_target=True, verbose=True, strict_mode=True)
result = converter.convert(
source_path="yolo_labels/",
target_path="coco_annotations.json",
class_file="classes.txt",
image_dir="images/",
do_rle=False # Set to True for RLE encoding
)
# Visualize YOLO annotations
visualizer = YOLOVisualizer(
label_dir="yolo_labels/",
image_dir="images/",
class_file="classes.txt",
is_show=True,
is_save=True,
output_dir="visualized/",
verbose=True,
strict_mode=True
)
result = visualizer.visualize()
See the samples/ directory for complete examples:
samples/visualize/yolo_demo.py- YOLO visualization examplesamples/visualize/labelme_demo.py- LabelMe visualization examplesamples/visualize/coco_demo.py- COCO visualization examplesamples/convert/- Conversion examples
Documentation
- CLAUDE.md: Detailed architecture and development guide
docs/formats/: Format specifications (YOLO, COCO, LabelMe)docs/specs/: Module specifications and design documentsCHANGELOG.md: Version history and breaking changes
Key Concepts
- Normalized Coordinates: All internal coordinates are in 0-1 range
- Original Data Preservation: Lossless round-trip conversion through
OriginalDatasystem - Strict Mode: Validation errors raise exceptions (default: enabled in CLI, can be disabled via
strict_mode=Falseparameter in Python API) - Verbose Logging: Detailed debug logs saved to files when
--verboseis used - Keyboard Shortcuts: During visualization, press
qorESCto exit early; any other key continues - Missing Image Handling: Missing images are skipped with warnings, allowing processing to continue
- RLE Mask Visualization: COCO RLE masks are displayed with semi-transparent fills for better visibility
Development
For detailed developer guidance including advanced test commands, debugging, and architecture overview, see CLAUDE.md.
Testing
# Run all tests
pytest
# Run tests with coverage
pytest --cov=dataflow
# Run specific test module
pytest tests/convert/test_yolo_and_coco.py
Linting and Formatting
# Install development dependencies
pip install -e .[dev]
# Format code
black dataflow tests samples
# Sort imports
isort dataflow tests samples
# Type checking
mypy dataflow
# Linting
flake8 dataflow tests samples
Project Structure
dataflow/
├── label/ # Annotation handlers (YOLO, LabelMe, COCO)
├── convert/ # Format converters
├── visualize/ # Visualization modules
├── util/ # Utilities (logging, file operations)
└── cli/ # Command-line interface
tests/ # Comprehensive test suite
samples/ # Usage examples
assets/ # Sample data for testing
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Before contributing, review CLAUDE.md for architecture and development patterns.
- Fork the repository
- Create a feature branch
- Make your changes
- Add or update tests as needed
- Ensure code passes formatting and linting checks
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Thanks to the creators of LabelMe, COCO, and YOLO formats for establishing these annotation standards
- Built with OpenCV, NumPy, and Click
- Inspired by the need for seamless format conversion in multi-tool CV pipelines
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataflow_cv-0.6.1.tar.gz.
File metadata
- Download URL: dataflow_cv-0.6.1.tar.gz
- Upload date:
- Size: 60.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e17b2ccc329248aa9783034bd4226ba61794c9f7fad536ff406a6e8058b607e9
|
|
| MD5 |
67ebb6341a784a2f6ce97a027b7c5940
|
|
| BLAKE2b-256 |
06ea2159c1abe82863a5a3f7c23889799295008fcf716c7d4043a91bec867c9b
|
Provenance
The following attestation bundles were made for dataflow_cv-0.6.1.tar.gz:
Publisher:
python-publish.yml on zjykzj/DataFlow-CV
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataflow_cv-0.6.1.tar.gz -
Subject digest:
e17b2ccc329248aa9783034bd4226ba61794c9f7fad536ff406a6e8058b607e9 - Sigstore transparency entry: 1203534775
- Sigstore integration time:
-
Permalink:
zjykzj/DataFlow-CV@820b52547813342efe3532e4f312c4355775406f -
Branch / Tag:
refs/tags/v0.6.1 - Owner: https://github.com/zjykzj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@820b52547813342efe3532e4f312c4355775406f -
Trigger Event:
release
-
Statement type:
File details
Details for the file dataflow_cv-0.6.1-py3-none-any.whl.
File metadata
- Download URL: dataflow_cv-0.6.1-py3-none-any.whl
- Upload date:
- Size: 73.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6a071c9f1fed58676c492d307aae4e509e324669e5b0a5e48f44d1e20763e1e
|
|
| MD5 |
d53000e0761de73002a4ce180bae19ba
|
|
| BLAKE2b-256 |
8f1ab69ca57809ff407cc2ad25bbab779752b8d49c7199defbae69f675d9ac6a
|
Provenance
The following attestation bundles were made for dataflow_cv-0.6.1-py3-none-any.whl:
Publisher:
python-publish.yml on zjykzj/DataFlow-CV
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataflow_cv-0.6.1-py3-none-any.whl -
Subject digest:
f6a071c9f1fed58676c492d307aae4e509e324669e5b0a5e48f44d1e20763e1e - Sigstore transparency entry: 1203534781
- Sigstore integration time:
-
Permalink:
zjykzj/DataFlow-CV@820b52547813342efe3532e4f312c4355775406f -
Branch / Tag:
refs/tags/v0.6.1 - Owner: https://github.com/zjykzj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@820b52547813342efe3532e4f312c4355775406f -
Trigger Event:
release
-
Statement type: