YOLO Dataset Quality Analysis Tool powered by FiftyOne

These details have not been verified by PyPI

Project links

Project description

🔍 YoloScout — YOLO Dataset quality analysis tool

A comprehensive tool for analyzing and visualizing YOLO dataset quality using FiftyOne

🚀 Quick Start

Installation

pip install yolo-scout

Basic Usage

# Option 1: Command-line only (no config file)
yolo scout --dataset-path /path/to/dataset --dataset-task detect

# Option 2: Config file only (more details below)
yolo scout --config my_config.yaml

# Option 3: Config file + overrides
yolo scout --config default.yaml --batch_size 8

# Option 4: Force reload of an existing dataset
yolo scout --dataset-path /path/to/dataset --dataset-task detect --reload

Note: installing yolo-scout registers its own yolo command that wraps ultralytics — all existing yolo train, yolo detect, etc. commands continue to work unchanged.

If you want to use the configuration file option, you can either create a config file (e.g., my_config.yaml) with the following structure:

dataset:
  path: "/path/to/your/dataset"
  name: "my_dataset"  # optional, auto-generated if not set
  task: "detect"  # detect, segment, classify, pose, obb
  reload: false

embeddings:
  skip: false
  model: "openai_clip"
  batch_size: 16
  mask_background: true  # Disable with --skip-mask-background if needed

thumbnails:
  dir: "./thumbnails"
  width: 800

quality:
  skip: false

port: 5151

Command-Line Arguments

Argument	Type	Default	Description
`--config`	`str`	`None`	Path to config YAML file. Overrides default settings.
`--dataset-path`	`str`	`None`	Path to your dataset. Required unless provided in config file and must follow the YOLO format.
`--dataset-task`	`str`	`'detect'`	Task type: `classify`, `detect`, `segment`, `pose`, `obb`. Required unless in config. More info on the tasks below.
`--dataset-name`	`str`	`'default'`	Name for the FiftyOne dataset. Auto-generated from path if not set.
`--reload`	`bool`	`false`	Force reload of the dataset even if it already exists. The current dataset will be deleted and recreated.
`--skip-embeddings`	`bool`	`false`	Skip CLIP embedding computation (useful for quick visualization).
`--embeddings-model`	`str`	`'openai_clip'`	Embeddings model to use. Possible values: `openai_clip`, `metaclip_400m`, `metaclip_fullcc`, `siglip_base_224`.
`--batch-size`	`int`	`16`	Batch size used during CLIP embedding computation.
`--skip-mask-background`	`bool`	`false`	Skip background masking for patch crops in segmentation/OBB tasks. Masking is enabled by default, replacing background with gray (114, 114, 114).
`--thumbnail-width`	`int`	`800`	Width (in pixels) of the generated image thumbnails in FiftyOne. The height is adjusted automatically to maintain aspect ratio. Set to `-1` to disable thumbnail saving.
`--thumbnail-dir`	`str`	`'./thumbnails'`	Path to the directory where the thumbnails are saved.
`--port`	`int`	`5151`	Port to launch the FiftyOne app on.
`--skip-quality`	`bool`	`false`	Skip image quality metrics computation (blurriness, brightness, aspect_ratio, entropy).
`--skip-launch`	`bool`	`false`	Skip launching the FiftyOne app after processing.

📊 Supported tasks and image metadata

For each expected task format, the following metadata will be computed and available in FiftyOne for each annotation:

Task	Available parameters when using the UI
`classify`	`cls_label.label`
`detect`	`area`, `width`, `height`, `iou_score`
`segment`	`area`, `num_keypoints`, `width`, `height`, `iou_score`
`obb`	`area`, `width`, `height`, `iou_score`
`pose`	`area`, `num_keypoints`, `width`, `height`, `iou_score`

Also, for each image, the following metadata will be computed:

Image Metadata	Description
`object_count`	Number of objects in the image
`metadata.size_bytes`	Size of the image file in bytes
`metadata.width`	Width of the image in pixels
`metadata.height`	Height of the image in pixels
`metadata.mime_type`	MIME type of the image (e.g., `image/jpeg`)
`metadata.num_channels`	Number of color channels (e.g., 3 for RGB)

The following quality metrics are computed unless --skip-quality is passed. All metrics operate on grayscale pixel values and are available at both image and patch level.

Metric	Description
`blurriness`	Inverse of the Laplacian variance. A score close to `1` indicates a blurry image, while a score close to `0` indicates a sharp one
`brightness`	Mean pixel intensity normalized between `0` and `1`. A score of `0` is fully dark and a score of `1` is fully bright
`aspect_ratio`	Width-to-height ratio of the image or patch crop. Values greater than `1` are wider than tall, values less than `1` are taller than wide
`entropy`	Shannon entropy of the pixel intensity histogram. A low score indicates a flat or visually repetitive image

⭐️ Supported Models

All models use 224x224 input resolution. This is a constraint imposed by FiftyOne's OpenCLIP integration (higher resolution variants (384, 512) cause preprocessing errors when computing embeddings). The 224x224 resolution provides excellent quality for most computer vision tasks while maintaining compatibility with FiftyOne's model zoo.

Model	Description	Training Dataset
openai_clip	Original OpenAI CLIP model with ViT-B/32 architecture. Hosted on GitHub releases for offline usage. This is the default model and works without internet connection after first download.	OpenAI CLIP
metaclip_400m	MetaCLIP model trained on curated 400M image-text pairs. Offers improved data quality and better embeddings compared to OpenAI CLIP while maintaining the same speed and architecture.	MetaCLIP
metaclip_fullcc	MetaCLIP model trained on the full CommonCrawl dataset. Provides the highest quality embeddings among MetaCLIP variants with more diverse training data.	MetaCLIP
siglip_base_224	SigLIP (Sigmoid Loss for Language-Image Pre-training) base model. Uses improved sigmoid loss function for better performance with smaller batch sizes and more efficient training.	SigLIP

Model Selection Guide

Use openai_clip if you want to use the most common embeddings model
Use metaclip_400m for better quality embeddings (recommended default)
Use metaclip_fullcc when you need the highest quality embeddings
Use siglip_base_224 as an alternative to CLIP-based models

All models have similar inference speed and produce 512-dimensional embeddings with full support for FiftyOne visualization and analysis features.

🧩 Additional Installed Plugins

This tool ships with a custom-built FiftyOne plugin that is automatically installed at startup. No manual setup required.

Plugin	Description	Icon	How to use?
`@ultralytics/image-adjuster`	Custom plugin to adjust image brightness, contrast, and label overlay opacity.		Open a sample, then click the slider icon in the bottom-left corner.

⚒️ Dataset Structure

This tool supports two common YOLO dataset directory structures:

Format 1: Type-First Structure

dataset/
├── images/
│   ├── train/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   ├── val/
│   │   ├── image1.jpg
│   │   └── ...
│   └── test/
│       └── ...
└── labels/
    ├── train/
    │   ├── image1.txt
    │   ├── image2.txt
    │   └── ...
    ├── val/
    │   ├── image1.txt
    │   └── ...
    └── test/
        └── ...

In this format, images and labels are organized by type first, then by split.

Format 2: Split-First Structure

dataset/
├── train/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── labels/
│       ├── image1.txt
│       ├── image2.txt
│       └── ...
├── val/
│   ├── images/
│   │   └── ...
│   └── labels/
│       └── ...
└── test/
    ├── images/
    │   └── ...
    └── labels/
        └── ...

In this format, the dataset is organized by split first, then by type (images/labels).

⌨️ FiftyOne commands

If you have used this tool at least one time to visualize a dataset, you can then use the following commands bellow to interact with the FiftyOne datasets and application:

# List all the datasets
fiftyone datasets list

# Delete a specific dataset using its name
fiftyone datasets delete <dataset_name>

# Delete all datasets
python -c "import fiftyone as fo; [fo.delete_dataset(name) for name in fo.list_datasets()]"

# Launch the FiftyOne app
fiftyone app launch

# Launch the FiftyOne app and pre-select a dataset using its name
fiftyone app launch <dataset_name>

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with FiftyOne by Voxel51
Inspired by Ultralytics YOLO ecosystem
CLIP models from OpenAI

Made with ❤️ for the YOLO community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.6

Apr 2, 2026

1.1.5

Mar 30, 2026

1.1.4

Mar 24, 2026

1.1.3

Mar 18, 2026

1.1.2

Mar 16, 2026

1.1.1

Mar 16, 2026

This version

1.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yolo_scout-1.1.0.tar.gz (1.3 MB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yolo_scout-1.1.0-py3-none-any.whl (42.7 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file yolo_scout-1.1.0.tar.gz.

File metadata

Download URL: yolo_scout-1.1.0.tar.gz
Upload date: Mar 16, 2026
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yolo_scout-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`35522da047246243eca2e02042c04591aa662a3abc3f5977bcb7d105ea3c515c`
MD5	`59ced5db466d49cca031fb00183b0954`
BLAKE2b-256	`8dbf8407b3e36cbefab6fe7e44e4a9c54972257c97f4df42c2f0fae4f7badd98`

See more details on using hashes here.

File details

Details for the file yolo_scout-1.1.0-py3-none-any.whl.

File metadata

Download URL: yolo_scout-1.1.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 42.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yolo_scout-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05504b5b963b71269760f4e20ea071ee2bc5ecb70c4be13daebd921301daccde`
MD5	`c20227e51baef25bbaf62be9899986fe`
BLAKE2b-256	`30610bec3d928934d3fe7fb9a9a5629ab7e40c7bcbf41712ea0cd2a99bcc4d42`

See more details on using hashes here.

yolo-scout 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔍 YoloScout — YOLO Dataset quality analysis tool

🚀 Quick Start

Installation

Basic Usage

Command-Line Arguments

📊 Supported tasks and image metadata

⭐️ Supported Models

Model Selection Guide

🧩 Additional Installed Plugins

⚒️ Dataset Structure

Format 1: Type-First Structure

Format 2: Split-First Structure

⌨️ FiftyOne commands

🤝 Contributing

📜 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes