This project consist of a library and a CLI for converting datasets between annotation formats.
Project description
VisionConverter
Index
Description
VisionConverter is a library for converting object detection annotation datasets between popular formats. It simplifies dataset interoperability for machine learning and computer vision projects.
Key Features:
- Bidirectional conversion between supported formats
- Unified internal representation ensures consistent and reliable transformations
Conversion Process:
- Load the input dataset from the specified path
- Transforms to internal representation
- Convert from internal representation to target output format
- Save the converted dataset to the desired output location
Installation
Requirements
Install from Source
Clone the repository and install the package:
git clone https://github.com/GCousido/VisionConverter.git
cd VisionConverter
pip install .
Development Installation
For development (including dependencies for testing) and in editable mode:
git clone https://github.com/GCousido/VisionConverter.git
cd VisionConverter
pip install -e ".[dev]"
How to Use
Library Usage
You can use DatasetConverter as a Python library to convert datasets programmatically.
Example
from vision_converter import YoloFormat, YoloConverter, CocoFormat, CocoConverter, NeutralFormat
yolo_dataset: YoloFormat = YoloFormat.read_from_folder("./dataset/yolo")
internal_dataset: NeutralFormat = YoloConverter.toNeutral(yolo_dataset)
coco_dataset: CocoFormat = CocoConverter.fromNeutral(internal_dataset)
coco_dataset.save("./dataset/coco")
Command Line Interface
The CLI provides a simple interface for converting datasets:
Basic Usage
vconverter --input-format <INPUT_FORMAT> --input-path <INPUT_PATH> --output-format <OUTPUT_FORMAT> --output-path <OUTPUT_PATH> <OPTIONS>
Required Arguments
--input-format: Source format--input-path: Path to the folder containing the input dataset--output-format: Target format--output-path: Path to save the converted dataset
Options
--copy-images: Copy images files to the output directory.--symlink-images: Creates symbolic links to the original images in the output directory.
Examples
Convert a YOLO dataset to COCO:
vconverter --input-format yolo --input-path ./datasets/yolo --output-format coco --output-path ./datasets/coco
Convert Pascal VOC to YOLO:
vconverter --input-format pascal_voc --input-path ./datasets/pascalvoc --output-format yolo --output-path ./datasets/yolo
Convert COCO to Pascal VOC with images:
vconverter --input-format coco --input-path ./datasets/coco --output-format pascal_voc --output-path ./datasets/pascalvoc --copy-images
Supported Formats
| Format | Input | Output | Parameter Value | Description |
|---|---|---|---|---|
| YOLO | ✅ | ✅ | yolo | YOLO format (.txt files with normalized coordinates and classes.txt for class names) |
| COCO | ✅ | ✅ | coco | Microsoft COCO format (.json with absolute coordinates) |
| Pascal VOC | ✅ | ✅ | pascal_voc | Pascal Visual Object Classes format (.xml files with absolute coordinates) |
| CreateML | ✅ | ✅ | createml | Apple CreateML format (.json with centered bounding boxes and absolute coordinates) |
| TensorFlow CSV | ✅ | ✅ | tensorflow_csv | TensorFlow Object Detection CSV format (.csv with absolute coordinates) |
| LabelMe | ✅ | ✅ | labelme | LabelMe JSON format (.json files with shape annotations and optional embedded image data) |
| VGG | ✅ | ✅ | vgg | VGG Image Annotator format (.json with multiple shape types and region attributes) |
Format Specifications
YOLO Format
- File Structure: One
.txtfile per image with same basename as the image - Annotation Format:
<class_id> <x_center> <y_center> <width> <height> - Coordinates: Normalized values between 0 and 1 (relatives to the image size)
- Additional Files:
classes.txtcontaining class names, one per line
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg │
│ img2.jpg │
├── labels/ ├── labels/
│ img1.txt │ img1.txt
│ img2.txt │ img2.txt
│ classes.txt │ classes.txt
COCO Format
- File Structure: Single
.jsonfile containing all annotations - Annotation Format: JSON with images, annotations and categories arrays
- Coordinates: Absolute pixel values
[x, y, width, height] - Metadata: Includes dataset
info,licenses, andcategorydefinitions
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg |
│ img2.jpg |
├── annotations.json ├── annotations.json
Pascal VOC Format
- File Structure: One
.xmlfile per image, sharing the basename with the image file - Annotation Format: XML structure with bounding box coordinates and class names
- Coordinates: Absolute pixel values
<xmin>, <ymin>, <xmax>, <ymax> - Metadata: Rich annotation metadata, including image
size, object attributes (difficult,truncated,occluded), andsourceinfo
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── JPEGImages/ ├── JPEGImages/
│ img1.jpg │
│ img2.jpg │
├── Annotations/ ├── Annotations/
│ img1.xml │ img1.xml
│ img2.xml │ img2.xml
|-- ImageSets/ |-- ImageSets/
CreateML Format
- File Structure: Single
.jsonfile containing all annotations and an images/ folder with image files - Annotation Format: JSON array with entries for each image, each containing image filename and annotations array
- Coordinates: Absolute pixel values with bounding boxes defined by center coordinates and dimensions
{x_center, y_center, width, height}
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg │
│ img2.jpg │
├── annotations.json ├── annotations.json
TensorFlow Object Detection CSV Format
- File Structure: Single
.csvfile containing all annotations - Annotation Format: CSV structure with specific columns for image metadata and bounding box coordinates
- Coordinates: Absolute pixel values
<xmin>, <ymin>, <xmax>, <ymax> - Required Columns:
filename,width,height,class,xmin,ymin,xmax,ymax - Features: Human-readable format, direct compatibility with TensorFlow Object Detection API, supports multiple objects per image
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg │
│ img2.jpg │
├── annotations.csv ├── annotations.csv
LabelMe JSON Format
- File Structure: One
.jsonfile per image containing annotations and image metadata - Annotation Format: JSON with shapes array, each shape having
label,points,shape_type,group_id,flags, and optionaldescription - Coordinates: Absolute pixel values for
pointsdefiningshapes(e.g., polygons, rectangles) - Image Data: Optional
base64encoded image data embedded inimageDatafield - Metadata: Includes dataset
version,flags,imagePath,imageHeight,imageWidth
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── img1.jpg ├── img1.jpg
├── img1.json ├── img1.json
├── img2.jpg ├── img2.jpg
├── img2.json ├── img2.json
VGG Image Annotator Format
- File Structure: Single
.jsonfile containing all annotations with VIA metadata structure - Annotation Format: JSON with
_via_img_metadatacontaining image entries, each withregionsarray for shape annotations - Coordinates: Absolute pixel values with support for 6 shape types:
rect,circle,ellipse,polygon,polyline,point - Shape Types:
- Rectangle:
{x, y, width, height}- top-left corner and dimensions - Circle:
{cx, cy, r}- center coordinates and radius - Ellipse:
{cx, cy, rx, ry, theta}- center, radii, and rotation angle - Polygon:
{all_points_x[], all_points_y[]}- arrays of vertex coordinates - Polyline:
{all_points_x[], all_points_y[]}- arrays of line point coordinates - Point:
{cx, cy}- single point coordinates
- Rectangle:
- Metadata: Includes
file_attributesfor image-level data,region_attributesfor annotation-level data, and optional VIA project settings
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
|-- images/ ├── images/
| img1.jpg |
| img2.jpg |
├── annotations.json ├── annotations.json
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_converter-0.1.0.tar.gz.
File metadata
- Download URL: vision_converter-0.1.0.tar.gz
- Upload date:
- Size: 41.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4dbd4a07ee5e9f714120cac33e28c2e87d39841ae6721b11cc03ffcc17178bf
|
|
| MD5 |
faab1e5079e8eaf93a81256c5238050b
|
|
| BLAKE2b-256 |
474c444cb780b06b967e1b31383a1fbe17571687d9df15611f08d5a7afae7c72
|
File details
Details for the file vision_converter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vision_converter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 54.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c251a7ce1bca4fea64c484c1d4295083a977f65347e68b95602618f57a3e6b7
|
|
| MD5 |
84e33ab73a912f495910f9a4477a41e0
|
|
| BLAKE2b-256 |
01ed0cadedf77d09391f697a637f82e94dc3f72c07a85728c871d75c1e807adc
|