This project consist of a library and a CLI for converting datasets between annotation formats.

These details have not been verified by PyPI

Project links

Project description

VisionConverter

License Status Last Commit

Index

Description
Installation
How to Use
Supported Formats
License

Description

VisionConverter is a library for converting object detection annotation datasets between popular formats. It simplifies dataset interoperability for machine learning and computer vision projects.

Key Features:

Bidirectional conversion between supported formats
Unified internal representation ensures consistent and reliable transformations

Conversion Process:

Load the input dataset from the specified path
Transforms to internal representation
Convert from internal representation to target output format
Save the converted dataset to the desired output location

Installation

Requirements

Python >= 3.12
Click >= 8.1.8
Pillow >= 11.2.1

Install from Source

Clone the repository and install the package:

git clone https://github.com/GCousido/VisionConverter.git
cd VisionConverter
pip install  .

Development Installation

For development (including dependencies for testing) and in editable mode:

git clone https://github.com/GCousido/VisionConverter.git
cd VisionConverter
pip install -e ".[dev]"

How to Use

Library Usage

You can use DatasetConverter as a Python library to convert datasets programmatically.

Example

from vision_converter import YoloFormat, YoloConverter, CocoFormat, CocoConverter, NeutralFormat

yolo_dataset: YoloFormat = YoloFormat.read_from_folder("./dataset/yolo")

internal_dataset: NeutralFormat = YoloConverter.toNeutral(yolo_dataset)

coco_dataset: CocoFormat = CocoConverter.fromNeutral(internal_dataset)

coco_dataset.save("./dataset/coco")

Command Line Interface

The CLI provides a simple interface for converting datasets:

Basic Usage

vconverter --input-format <INPUT_FORMAT> --input-path <INPUT_PATH> --output-format <OUTPUT_FORMAT> --output-path <OUTPUT_PATH> <OPTIONS>

Required Arguments

--input-format: Source format
--input-path: Path to the folder containing the input dataset
--output-format: Target format
--output-path: Path to save the converted dataset

Options

--copy-images: Copy images files to the output directory.
--symlink-images: Creates symbolic links to the original images in the output directory.

Examples

Convert a YOLO dataset to COCO:

vconverter --input-format yolo --input-path ./datasets/yolo --output-format coco --output-path ./datasets/coco

Convert Pascal VOC to YOLO:

vconverter --input-format pascal_voc --input-path ./datasets/pascalvoc --output-format yolo --output-path ./datasets/yolo

Convert COCO to Pascal VOC with images:

vconverter --input-format coco --input-path ./datasets/coco --output-format pascal_voc --output-path ./datasets/pascalvoc --copy-images

Supported Formats

Format	Input	Output	Parameter Value	Description
YOLO	✅	✅	yolo	YOLO format (.txt files with normalized coordinates and classes.txt for class names)
COCO	✅	✅	coco	Microsoft COCO format (.json with absolute coordinates)
Pascal VOC	✅	✅	pascal_voc	Pascal Visual Object Classes format (.xml files with absolute coordinates)
CreateML	✅	✅	createml	Apple CreateML format (.json with centered bounding boxes and absolute coordinates)
TensorFlow CSV	✅	✅	tensorflow_csv	TensorFlow Object Detection CSV format (.csv with absolute coordinates)
LabelMe	✅	✅	labelme	LabelMe JSON format (.json files with shape annotations and optional embedded image data)
VGG	✅	✅	vgg	VGG Image Annotator format (.json with multiple shape types and region attributes)

Format Specifications

YOLO Format

File Structure: One .txt file per image with same basename as the image
Annotation Format: <class_id> <x_center> <y_center> <width> <height>
Coordinates: Normalized values between 0 and 1 (relatives to the image size)
Additional Files: classes.txt containing class names, one per line

EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     │
        │     img2.jpg                                     │
        ├── labels/                                        ├── labels/
        │     img1.txt                                     │     img1.txt
        │     img2.txt                                     │     img2.txt
        │     classes.txt                                  │     classes.txt

COCO Format

File Structure: Single .json file containing all annotations
Annotation Format: JSON with images, annotations and categories arrays
Coordinates: Absolute pixel values [x, y, width, height]
Metadata: Includes dataset info, licenses, and category definitions

EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     |
        │     img2.jpg                                     |   
        ├── annotations.json                               ├── annotations.json

Pascal VOC Format

File Structure: One .xml file per image, sharing the basename with the image file
Annotation Format: XML structure with bounding box coordinates and class names
Coordinates: Absolute pixel values <xmin>, <ymin>, <xmax>, <ymax>
Metadata: Rich annotation metadata, including image size, object attributes (difficult, truncated, occluded), and source info

EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── JPEGImages/                                    ├── JPEGImages/
        │     img1.jpg                                     │     
        │     img2.jpg                                     │     
        ├── Annotations/                                   ├── Annotations/
        │     img1.xml                                     │     img1.xml
        │     img2.xml                                     │     img2.xml
        |-- ImageSets/                                     |-- ImageSets/

CreateML Format

File Structure: Single .json file containing all annotations and an images/ folder with image files
Annotation Format: JSON array with entries for each image, each containing image filename and annotations array
Coordinates: Absolute pixel values with bounding boxes defined by center coordinates and dimensions {x_center, y_center, width, height}

EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     │     
        │     img2.jpg                                     │     
        ├── annotations.json                               ├── annotations.json

TensorFlow Object Detection CSV Format

File Structure: Single .csv file containing all annotations
Annotation Format: CSV structure with specific columns for image metadata and bounding box coordinates
Coordinates: Absolute pixel values <xmin>, <ymin>, <xmax>, <ymax>
Required Columns: filename, width, height, class, xmin, ymin, xmax, ymax
Features: Human-readable format, direct compatibility with TensorFlow Object Detection API, supports multiple objects per image

EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     │     
        │     img2.jpg                                     │     
        ├── annotations.csv                                ├── annotations.csv

LabelMe JSON Format

File Structure: One .json file per image containing annotations and image metadata
Annotation Format: JSON with shapes array, each shape having label, points, shape_type, group_id, flags, and optional description
Coordinates: Absolute pixel values for points defining shapes (e.g., polygons, rectangles)
Image Data: Optional base64 encoded image data embedded in imageData field
Metadata: Includes dataset version, flags, imagePath, imageHeight, imageWidth

EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── img1.jpg                                       ├── img1.jpg
        ├── img1.json                                      ├── img1.json
        ├── img2.jpg                                       ├── img2.jpg
        ├── img2.json                                      ├── img2.json

VGG Image Annotator Format

File Structure: Single .json file containing all annotations with VIA metadata structure
Annotation Format: JSON with _via_img_metadata containing image entries, each with regions array for shape annotations
Coordinates: Absolute pixel values with support for 6 shape types: rect, circle, ellipse, polygon, polyline, point
Shape Types:
- Rectangle: {x, y, width, height} - top-left corner and dimensions
- Circle: {cx, cy, r} - center coordinates and radius
- Ellipse: {cx, cy, rx, ry, theta} - center, radii, and rotation angle
- Polygon: {all_points_x[], all_points_y[]} - arrays of vertex coordinates
- Polyline: {all_points_x[], all_points_y[]} - arrays of line point coordinates
- Point: {cx, cy} - single point coordinates
Metadata: Includes file_attributes for image-level data, region_attributes for annotation-level data, and optional VIA project settings

EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        |-- images/                                        ├── images/
        |     img1.jpg                                     |
        |     img2.jpg                                     | 
        ├── annotations.json                               ├── annotations.json

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_converter-0.1.0.tar.gz (41.8 kB view details)

Uploaded Jun 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vision_converter-0.1.0-py3-none-any.whl (54.4 kB view details)

Uploaded Jun 24, 2025 Python 3

File details

Details for the file vision_converter-0.1.0.tar.gz.

File metadata

Download URL: vision_converter-0.1.0.tar.gz
Upload date: Jun 24, 2025
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.3

File hashes

Hashes for vision_converter-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f4dbd4a07ee5e9f714120cac33e28c2e87d39841ae6721b11cc03ffcc17178bf`
MD5	`faab1e5079e8eaf93a81256c5238050b`
BLAKE2b-256	`474c444cb780b06b967e1b31383a1fbe17571687d9df15611f08d5a7afae7c72`

See more details on using hashes here.

File details

Details for the file vision_converter-0.1.0-py3-none-any.whl.

File metadata

Download URL: vision_converter-0.1.0-py3-none-any.whl
Upload date: Jun 24, 2025
Size: 54.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.3

File hashes

Hashes for vision_converter-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c251a7ce1bca4fea64c484c1d4295083a977f65347e68b95602618f57a3e6b7`
MD5	`84e33ab73a912f495910f9a4477a41e0`
BLAKE2b-256	`01ed0cadedf77d09391f697a637f82e94dc3f72c07a85728c871d75c1e807adc`

See more details on using hashes here.

vision-converter 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VisionConverter

Index

Description

Installation

Requirements

Install from Source

Development Installation

How to Use

Library Usage

Example

Command Line Interface

Basic Usage

Required Arguments

Options

Examples

Supported Formats

Format Specifications

YOLO Format

COCO Format

Pascal VOC Format

CreateML Format

TensorFlow Object Detection CSV Format

LabelMe JSON Format

VGG Image Annotator Format

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes