Skip to main content

This project consist of a library and a CLI for converting datasets between annotation formats.

Project description

VisionConverter

License Status Last Commit

Index

Description

VisionConverter is a library for converting object detection annotation datasets between popular formats. It simplifies dataset interoperability for machine learning and computer vision projects.

Key Features:

  • Bidirectional conversion between supported formats
  • Unified internal representation ensures consistent and reliable transformations

Conversion Process:

  1. Load the input dataset from the specified path
  2. Transforms to internal representation
  3. Convert from internal representation to target output format
  4. Save the converted dataset to the desired output location

Installation

Requirements

Install from Source

Clone the repository and install the package:

git clone https://github.com/GCousido/VisionConverter.git
cd VisionConverter
pip install  .

Development Installation

For development (including dependencies for testing) and in editable mode:

git clone https://github.com/GCousido/VisionConverter.git
cd VisionConverter
pip install -e ".[dev]"

How to Use

Library Usage

You can use DatasetConverter as a Python library to convert datasets programmatically.

Example

from vision_converter import YoloFormat, YoloConverter, CocoFormat, CocoConverter, NeutralFormat

yolo_dataset: YoloFormat = YoloFormat.read_from_folder("./dataset/yolo")

internal_dataset: NeutralFormat = YoloConverter.toNeutral(yolo_dataset)

coco_dataset: CocoFormat = CocoConverter.fromNeutral(internal_dataset)

coco_dataset.save("./dataset/coco")

Command Line Interface

The CLI provides a simple interface for converting datasets:

Basic Usage

vconverter --input-format <INPUT_FORMAT> --input-path <INPUT_PATH> --output-format <OUTPUT_FORMAT> --output-path <OUTPUT_PATH> <OPTIONS>

Required Arguments

  • --input-format: Source format
  • --input-path: Path to the folder containing the input dataset
  • --output-format: Target format
  • --output-path: Path to save the converted dataset

Options

  • --copy-images: Copy images files to the output directory.
  • --symlink-images: Creates symbolic links to the original images in the output directory.

Examples

Convert a YOLO dataset to COCO:

vconverter --input-format yolo --input-path ./datasets/yolo --output-format coco --output-path ./datasets/coco

Convert Pascal VOC to YOLO:

vconverter --input-format pascal_voc --input-path ./datasets/pascalvoc --output-format yolo --output-path ./datasets/yolo

Convert COCO to Pascal VOC with images:

vconverter --input-format coco --input-path ./datasets/coco --output-format pascal_voc --output-path ./datasets/pascalvoc --copy-images

Supported Formats

Format Input Output Parameter Value Description
YOLO yolo YOLO format (.txt files with normalized coordinates and classes.txt for class names)
COCO coco Microsoft COCO format (.json with absolute coordinates)
Pascal VOC pascal_voc Pascal Visual Object Classes format (.xml files with absolute coordinates)
CreateML createml Apple CreateML format (.json with centered bounding boxes and absolute coordinates)
TensorFlow CSV tensorflow_csv TensorFlow Object Detection CSV format (.csv with absolute coordinates)
LabelMe labelme LabelMe JSON format (.json files with shape annotations and optional embedded image data)
VGG vgg VGG Image Annotator format (.json with multiple shape types and region attributes)

Format Specifications

YOLO Format

  • File Structure: One .txt file per image with same basename as the image
  • Annotation Format: <class_id> <x_center> <y_center> <width> <height>
  • Coordinates: Normalized values between 0 and 1 (relatives to the image size)
  • Additional Files: classes.txt containing class names, one per line
EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     │
        │     img2.jpg                                     │
        ├── labels/                                        ├── labels/
        │     img1.txt                                     │     img1.txt
        │     img2.txt                                     │     img2.txt
        │     classes.txt                                  │     classes.txt

COCO Format

  • File Structure: Single .json file containing all annotations
  • Annotation Format: JSON with images, annotations and categories arrays
  • Coordinates: Absolute pixel values [x, y, width, height]
  • Metadata: Includes dataset info, licenses, and category definitions
EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     |
        │     img2.jpg                                     |   
        ├── annotations.json                               ├── annotations.json   

Pascal VOC Format

  • File Structure: One .xml file per image, sharing the basename with the image file
  • Annotation Format: XML structure with bounding box coordinates and class names
  • Coordinates: Absolute pixel values <xmin>, <ymin>, <xmax>, <ymax>
  • Metadata: Rich annotation metadata, including image size, object attributes (difficult, truncated, occluded), and source info
EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── JPEGImages/                                    ├── JPEGImages/
        │     img1.jpg                                     │     
        │     img2.jpg                                     │     
        ├── Annotations/                                   ├── Annotations/
        │     img1.xml                                     │     img1.xml
        │     img2.xml                                     │     img2.xml
        |-- ImageSets/                                     |-- ImageSets/

CreateML Format

  • File Structure: Single .json file containing all annotations and an images/ folder with image files
  • Annotation Format: JSON array with entries for each image, each containing image filename and annotations array
  • Coordinates: Absolute pixel values with bounding boxes defined by center coordinates and dimensions {x_center, y_center, width, height}
EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     │     
        │     img2.jpg                                     │     
        ├── annotations.json                               ├── annotations.json

TensorFlow Object Detection CSV Format

  • File Structure: Single .csv file containing all annotations
  • Annotation Format: CSV structure with specific columns for image metadata and bounding box coordinates
  • Coordinates: Absolute pixel values <xmin>, <ymin>, <xmax>, <ymax>
  • Required Columns: filename, width, height, class, xmin, ymin, xmax, ymax
  • Features: Human-readable format, direct compatibility with TensorFlow Object Detection API, supports multiple objects per image
EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── images/                                        ├── images/
        │     img1.jpg                                     │     
        │     img2.jpg                                     │     
        ├── annotations.csv                                ├── annotations.csv

LabelMe JSON Format

  • File Structure: One .json file per image containing annotations and image metadata
  • Annotation Format: JSON with shapes array, each shape having label, points, shape_type, group_id, flags, and optional description
  • Coordinates: Absolute pixel values for points defining shapes (e.g., polygons, rectangles)
  • Image Data: Optional base64 encoded image data embedded in imageData field
  • Metadata: Includes dataset version, flags, imagePath, imageHeight, imageWidth
EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        ├── img1.jpg                                       ├── img1.jpg
        ├── img1.json                                      ├── img1.json
        ├── img2.jpg                                       ├── img2.jpg
        ├── img2.json                                      ├── img2.json

VGG Image Annotator Format

  • File Structure: Single .json file containing all annotations with VIA metadata structure
  • Annotation Format: JSON with _via_img_metadata containing image entries, each with regions array for shape annotations
  • Coordinates: Absolute pixel values with support for 6 shape types: rect, circle, ellipse, polygon, polyline, point
  • Shape Types:
    • Rectangle: {x, y, width, height} - top-left corner and dimensions
    • Circle: {cx, cy, r} - center coordinates and radius
    • Ellipse: {cx, cy, rx, ry, theta} - center, radii, and rotation angle
    • Polygon: {all_points_x[], all_points_y[]} - arrays of vertex coordinates
    • Polyline: {all_points_x[], all_points_y[]} - arrays of line point coordinates
    • Point: {cx, cy} - single point coordinates
  • Metadata: Includes file_attributes for image-level data, region_attributes for annotation-level data, and optional VIA project settings
EXPECTED INPUT FILE STRUCTURE                      GENERATED OUTPUT FILE STRUCTURE
      dataset/                                           dataset/
        |-- images/                                        ├── images/
        |     img1.jpg                                     |
        |     img2.jpg                                     | 
        ├── annotations.json                               ├── annotations.json 

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_converter-0.1.0.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vision_converter-0.1.0-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file vision_converter-0.1.0.tar.gz.

File metadata

  • Download URL: vision_converter-0.1.0.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.3

File hashes

Hashes for vision_converter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f4dbd4a07ee5e9f714120cac33e28c2e87d39841ae6721b11cc03ffcc17178bf
MD5 faab1e5079e8eaf93a81256c5238050b
BLAKE2b-256 474c444cb780b06b967e1b31383a1fbe17571687d9df15611f08d5a7afae7c72

See more details on using hashes here.

File details

Details for the file vision_converter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vision_converter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c251a7ce1bca4fea64c484c1d4295083a977f65347e68b95602618f57a3e6b7
MD5 84e33ab73a912f495910f9a4477a41e0
BLAKE2b-256 01ed0cadedf77d09391f697a637f82e94dc3f72c07a85728c871d75c1e807adc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page