Skip to main content

A package for generating synthetic data, using labelme and generating synthetic images.

Project description

Synthetic Data Generation

A bunch of scripts to generate synthetic images for YOLO.

Install

  1. Install the required packages:
pip install syndatagenyolo

Tools

Extract Labelme Objects

With this scripts, you can extract objects, which are annotated with labelme, from images.

syndatagenyolo extract --input_dir INPUT_DIR --output_dir OUTPUT_DIR --margin MARGIN
  • INPUT_DIR: The directory where the images are stored.
  • OUTPUT_DIR: The directory where the extracted objects will be stored.
  • MARGIN: The margin around the object. Usefull to add some space around the object and blend it into the background.

Generate Synthetic Images

With this script, you can generate synthetic images with the extracted objects and corresponding backgrounds.

Minimal example:

syndatagenyolo generate --input_dir INPUT_DIR --output_dir OUTPUT_DIR --image_number IMAGE_NUMBER
  • INPUT_DIR: The directory where the extracted objects are stored.
  • OUTPUT_DIR: The directory where the synthetic images will be stored.
  • IMAGE_NUMBER: The number of synthetic images to generate.
INPUT_DIR

The input directory should have the following structure:

INPUT_DIR
├── backgrounds
│   ├── background_1.jpg
│   ├── background_2.jpg
│   └── ...
├── foregrounds
|   ├── object_1
|   │   ├── object_1_1.png
|   │   ├── object_1_2.png
|   │   └── ...
|   ├── object_2
|   │   ├── object_2_1.png
|   │   ├── object_2_2.png
|   │   └── ...
|   └── ...
└── labels (optional - use with '--yolo_input')
    ├── background_1.txt
    ├── background_2.txt
    └── ...

Maximal example:

syndatagenyolo generate --input_dir INPUT_DIR --output_dir OUTPUT_DIR --image_number IMAGE_NUMBER --augmentation_path AUGMENTATION_PATH --max_objects_per_image MAX_OBJECTS_PER_IMAGE --image_width IMAGE_WIDTH --image_height IMAGE_HEIGHT --fixed_image_sizes --scale_foreground_by_background_size --scaling_factors SCALING_FACTORS SCALING_FACTORS --avoid_collisions --parallelize --yolo_input --yolo --color_harmon_alpha COLOR_HARMON_ALPHA --color_harmon_random --gaussian_options GAUSSIAN_OPTIONS GAUSSIAN_OPTIONS --debug --blending_methods BLENDING_METHODS BLENDING_METHODS --pyramid_blending_levels PYRAMID_BLENDING_LEVELS --distractor_objects DISTRACTOR_OBJECTS DISTRACTOR_OBJECTS
  • AUGMENTATION_PATH: Path to a albumentations augmentation file.
  • MAX_OBJECTS_PER_IMAGE: The maximum number of objects per image.
  • IMAGE_WIDTH: The width of the generated images.
  • IMAGE_HEIGHT: The height of the generated images.
  • FIXED_IMAGE_SIZES: If set, the images will have the same size.
  • SCALE_FOREGROUND_BY_BACKGROUND_SIZE: If set, the foreground objects will be scaled by the background size.
  • SCALING_FACTORS: The scaling factors for the foreground objects.
  • AVOID_COLLISIONS: If set, the objects will be placed in a way that they do not overlap.
  • PARALLELIZE: If set, the generation will be parallelized using multiple processes.
  • YOLO_INPUT: If set, the background images can contain yolo annotations.
  • YOLO: If set, the generated images will have yolo annotations. Else COCO annotations will be used.
  • COLOR_HARMON_ALPHA: The alpha value for the color harmonization.
  • COLOR_HARMON_RANDOM: If set, the color harmonization will be random.
  • GAUSSIAN_OPTIONS: The gaussian options for the blending. kernel_size and sigma (e.g. 5 1).
  • DEBUG: If set, the debug mode will be activated.
  • BLENDING_METHODS: The blending methods for the foreground objects. See below.
  • PYRAMID_BLENDING_LEVELS: The number of pyramid blending levels.
  • DISTRACTOR_OBJECTS: The names of foreground objects which should be used as distractor objects. (Not implemented yet, but will exclude these objects from the annotation file.)

Blending Methods

The blending methods are defined as follows:

  • 'ALPHA': Alpha blending.
  • 'GAUSSIAN': Gaussian blending.
  • 'PYRAMID': Pyramid blending.
  • 'POISSON_NORMAL': Poisson blending with normal blending (using the cv2.seamlessClone() function).
  • 'POISSON_MIXED': Poisson blending with mixed blending (using the cv2.seamlessClone() function).

The blending methods can be combined with a space.

Mix Datasets

syndatagenyolo mix --input_dirs INPUT_DIRS --output_dir OUTPUT_DIR --output_splits OUTPUT_SPLITS --percent_sets PERCENT_SETS --test_dataset TEST_DATASET --fixed_data_path FIXED_DATA_PATH --class_names CLASS_NAMES
  • INPUT_DIRS: The directories where the datasets are stored.
  • OUTPUT_DIR: The directory where the mixed dataset will be stored.
  • OUTPUT_SPLITS: The output splits for the mixed dataset. (eg. 0.8 0.2 => 80% train, 20% validation)
  • PERCENT_SETS: The percentage of the datasets which should be used. (eg. 0.5 0.5 => 50% of each dataset)
  • TEST_DATASET: The dataset which should be used as test dataset.
  • FIXED_DATA_PATH: Use a absolute path in the data.yaml file.
  • CLASS_NAMES: The class names for the mixed dataset. (eg. class_1 class_2 class_3)

Demo

Streamlit Demo

Contribute

  1. Clone the repository:
git clone
  1. Install the required packages:
pip install -r requirements.txt
  1. Install the package in editable mode:
pip install -e .

Publish

  1. Update the version in pyproject.toml.

  2. Update the CHANGELOG.md.

  3. Build the package:

uv pip install --upgrade build
uv run -m build
  1. Publish the package:
uv pip install --upgrade twine
uv run -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syndatagenyolo-0.2.0.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syndatagenyolo-0.2.0-py3-none-any.whl (39.7 kB view details)

Uploaded Python 3

File details

Details for the file syndatagenyolo-0.2.0.tar.gz.

File metadata

  • Download URL: syndatagenyolo-0.2.0.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for syndatagenyolo-0.2.0.tar.gz
Algorithm Hash digest
SHA256 20b0c76894fe322a0e304e43d7b96bf38022e4c49f7bc5962f464462a70a950f
MD5 4b34d8cae04e35ece25b226eee4e386c
BLAKE2b-256 20ba682ab8038a5d7cec17acdac604d36118c37466faa6725e6f64ec175dc1b5

See more details on using hashes here.

File details

Details for the file syndatagenyolo-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: syndatagenyolo-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 39.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for syndatagenyolo-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1dea32defb14434f6e2091090f30ecbec47d5963b79a012be7c17f9191e8f4d2
MD5 60f59f566ae1cc784ea425a4727a8640
BLAKE2b-256 8cb259c7cbdd07ec3596be0808d559685ad1c7abd3ff84e7ced0a81761cfb4e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page