YOLO Dataset Quality Analysis Tool powered by FiftyOne

These details have not been verified by PyPI

Project links

Project description

🔍 YoloScout — YOLO Dataset quality analysis tool

A comprehensive tool for analyzing and visualizing YOLO dataset quality using a custom FiftyOne wrapper

🚀 Quick Start

Installation

# Install the package from PyPI
pip install yolo-scout

Basic Usage

# Option 1: Local directory
yolo-scout data=/path/to/dataset task=detect

# Option 2: data.yaml file (resolves to parent directory automatically)
yolo-scout data=/path/to/dataset/data.yaml task=detect

# Option 3: Ultralytics Hub dataset
ULTRALYTICS_API_KEY=<your_key> yolo-scout data=ul://username/datasets/my-dataset task=detect

# Option 4: Config file only (more details below)
yolo-scout config=my_config.yaml

# Option 5: Config file + overrides
yolo-scout config=default.yaml batch=8

# Option 6: Force reload of an existing dataset
yolo-scout data=/path/to/dataset task=detect reload=True

If you want to use the configuration file option, you can create a config file (e.g., my_config.yaml) with the following structure (all keys are optional and override the defaults):

data: "/path/to/your/dataset"  # directory, data.yaml, or ul://username/datasets/slug
task: "detect"  # detect, segment, classify, pose, obb
name: "my_dataset"  # auto-generated from path if not set
reload: false
dataset_dir: "yolo_scout/datasets"  # where URL-sourced datasets are downloaded

skip_embeddings: false
model: "openai_clip"
batch: 16
mask_background: true

thumbnail_dir: "yolo_scout/thumbnails"
thumbnail_width: 800

skip_quality: false

port: 5151
skip_launch: false
verbose: false

Command-Line Arguments

Argument	Type	Default	Description
`config`	`str`	`None`	Path to config YAML file. Overrides default settings.
`data`	`str`	`None`	Path to your dataset. Required unless provided in config file. See Supported data sources for all accepted formats.
`task`	`str`	`'detect'`	Task type: `classify`, `detect`, `segment`, `pose`, `obb`. Required unless in config. More info on the tasks below.
`name`	`str`	`None`	Name for the FiftyOne dataset. Auto-generated from path if not set.
`dataset_dir`	`str`	`'yolo_scout/datasets'`	Destination directory for datasets downloaded from a URL. Only used when `data` is a URL.
`reload`	`bool`	`False`	Force reload of the dataset even if it already exists. The current dataset will be deleted and recreated.
`skip_embeddings`	`bool`	`False`	Skip CLIP embedding computation (useful for quick visualization).
`model`	`str`	`'openai_clip'`	Embeddings model to use. See Supported Models for available options and a selection guide.
`batch`	`int`	`16`	Batch size used during CLIP embedding computation.
`mask_background`	`bool`	`True`	Mask background in patch crops for segmentation/OBB tasks. When enabled, background is replaced with gray (114, 114, 114). Set to `False` to disable.
`thumbnail_width`	`int`	`800`	Width (in pixels) of the generated image thumbnails in FiftyOne. The height is adjusted automatically to maintain aspect ratio. Set to `-1` to disable thumbnail saving.
`thumbnail_dir`	`str`	`'yolo_scout/thumbnails'`	Path to the directory where the thumbnails are saved.
`port`	`int`	`5151`	Port to launch the FiftyOne app on.
`skip_quality`	`bool`	`False`	Skip image quality metrics computation (blurriness, brightness, aspect_ratio, entropy).
`skip_launch`	`bool`	`False`	Skip launching the FiftyOne app after processing.
`verbose`	`bool`	`False`	Enable debug logging.

📊 Supported tasks and image metadata

For each expected task format, the following metadata will be computed and available in FiftyOne for each annotation:

Task	Available parameters when using the UI
`classify`	`cls_label.label`
`detect`	`area`, `width`, `height`, `iou_score`
`segment`	`area`, `num_keypoints`, `width`, `height`, `iou_score`
`obb`	`area`, `width`, `height`, `iou_score`
`pose`	`area`, `num_keypoints`, `width`, `height`, `iou_score`

Also, for each image, the following metadata will be computed:

Image Metadata	Description
`object_count`	Number of objects in the image
`metadata.size_bytes`	Size of the image file in bytes
`metadata.width`	Width of the image in pixels
`metadata.height`	Height of the image in pixels
`metadata.mime_type`	MIME type of the image (e.g., `image/jpeg`)
`metadata.num_channels`	Number of color channels (e.g., 3 for RGB)

The following quality metrics are computed unless skip_quality is passed. All metrics operate on grayscale pixel values and are available at both image and patch level.

Metric	Description
`blurriness`	Inverse of the Laplacian variance. A score close to `1` indicates a blurry image, while a score close to `0` indicates a sharp one
`brightness`	Mean pixel intensity normalized between `0` and `1`. A score of `0` is fully dark and a score of `1` is fully bright
`aspect_ratio`	Width-to-height ratio of the image or patch crop. Values greater than `1` are wider than tall, values less than `1` are taller than wide
`entropy`	Shannon entropy of the pixel intensity histogram. A low score indicates a flat or visually repetitive image

⭐️ Supported Models

All models use 224x224 input resolution. This is a constraint imposed by FiftyOne's OpenCLIP integration (higher resolution variants (384, 512) cause preprocessing errors when computing embeddings). The 224x224 resolution provides excellent quality for most computer vision tasks while maintaining compatibility with FiftyOne's model zoo.

Model	Description	Training Dataset
openai_clip	Original OpenAI CLIP model with ViT-B/32 architecture. Hosted on GitHub releases for offline usage. This is the default model and works without internet connection after first download.	OpenAI CLIP
metaclip_400m	MetaCLIP model trained on curated 400M image-text pairs. Offers improved data quality and better embeddings compared to OpenAI CLIP while maintaining the same speed and architecture.	MetaCLIP
metaclip_fullcc	MetaCLIP model trained on the full CommonCrawl dataset. Provides the highest quality embeddings among MetaCLIP variants with more diverse training data.	MetaCLIP
siglip_base_224	SigLIP (Sigmoid Loss for Language-Image Pre-training) base model. Uses improved sigmoid loss function for better performance with smaller batch sizes and more efficient training.	SigLIP

Model Selection Guide

Use openai_clip if you want to use the most common embeddings model
Use metaclip_400m for better quality embeddings (recommended default)
Use metaclip_fullcc when you need the highest quality embeddings
Use siglip_base_224 as an alternative to CLIP-based models

All models have similar inference speed and produce 512-dimensional embeddings with full support for FiftyOne visualization and analysis features.

🌐 Supported data sources

Format	Example	Notes
Local directory	`data=/path/to/dataset`	Standard YOLO directory structure
YAML file	`data=/path/to/data.yaml`	Resolves to the parent directory automatically
NDJSON file	`data=/path/to/file.ndjson`	Pre-downloaded Ultralytics Platform export; images are downloaded and converted to YOLO layout
URL	`data=<url>`	See supported URL schemes below

The supported URL schemes for the data argument are:

Scheme	Example	Notes
`ul://`	`ul://<username>/datasets/<slug>`	Ultralytics Platform, requires `ULTRALYTICS_API_KEY`

🧩 Additional installed plugins

This tool ships with a custom-built FiftyOne plugin that is automatically installed at startup. No manual setup required.

Plugin	Description	Icon	How to use?
`@ultralytics/image-adjuster`	Custom plugin to adjust image brightness, contrast, and label overlay opacity.		Open a sample, then click the slider icon in the bottom-left corner.

⚒️ Dataset Structure

This tool supports two common YOLO dataset directory structures:

Format 1: Type-First Structure

dataset/
├── images/
│   ├── train/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   ├── val/
│   │   ├── image1.jpg
│   │   └── ...
│   └── test/
│       └── ...
└── labels/
    ├── train/
    │   ├── image1.txt
    │   ├── image2.txt
    │   └── ...
    ├── val/
    │   ├── image1.txt
    │   └── ...
    └── test/
        └── ...

In this format, images and labels are organized by type first, then by split.

Format 2: Split-First Structure

dataset/
├── train/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── labels/
│       ├── image1.txt
│       ├── image2.txt
│       └── ...
├── val/
│   ├── images/
│   │   └── ...
│   └── labels/
│       └── ...
└── test/
    ├── images/
    │   └── ...
    └── labels/
        └── ...

In this format, the dataset is organized by split first, then by type (images/labels).

⌨️ FiftyOne commands

If you have used this tool at least one time to visualize a dataset, you can then use the following commands bellow to interact with the FiftyOne datasets and application:

# List all the datasets
fiftyone datasets list

# Delete a specific dataset using its name
fiftyone datasets delete <dataset_name>

# Delete all datasets
python -c "import fiftyone as fo; [fo.delete_dataset(name) for name in fo.list_datasets()]"

# Launch the FiftyOne app
fiftyone app launch

# Launch the FiftyOne app and pre-select a dataset using its name
fiftyone app launch <dataset_name>

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with FiftyOne by Voxel51
Inspired by Ultralytics YOLO ecosystem
CLIP models from OpenAI

Made with ❤️ for the YOLO community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.6

Apr 2, 2026

1.1.5

Mar 30, 2026

This version

1.1.4

Mar 24, 2026

1.1.3

Mar 18, 2026

1.1.2

Mar 16, 2026

1.1.1

Mar 16, 2026

1.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yolo_scout-1.1.4.tar.gz (1.3 MB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yolo_scout-1.1.4-py3-none-any.whl (49.4 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file yolo_scout-1.1.4.tar.gz.

File metadata

Download URL: yolo_scout-1.1.4.tar.gz
Upload date: Mar 24, 2026
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yolo_scout-1.1.4.tar.gz
Algorithm	Hash digest
SHA256	`5ff777d0fc5804d3c1b065cda824acaf64af55104a29464666886b4e6a673853`
MD5	`fb5392af995803b31dcb607a462dbacf`
BLAKE2b-256	`ea41ad9dd02ffe2f919ec05fd8c8f8e3242521d46edca617c50eb046ea1db02d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yolo_scout-1.1.4.tar.gz:

Publisher: cd.yml on picsalex/yolo-scout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yolo_scout-1.1.4.tar.gz
- Subject digest: 5ff777d0fc5804d3c1b065cda824acaf64af55104a29464666886b4e6a673853
- Sigstore transparency entry: 1172783934
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: picsalex/yolo-scout@3abf0907c7d8f7c6cb8d176004246a4053d1f19a
- Branch / Tag: refs/tags/v1.1.4
- Owner: https://github.com/picsalex
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@3abf0907c7d8f7c6cb8d176004246a4053d1f19a
- Trigger Event: push

File details

Details for the file yolo_scout-1.1.4-py3-none-any.whl.

File metadata

Download URL: yolo_scout-1.1.4-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 49.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yolo_scout-1.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3f4a114b139415426fd285e64124f12ca1c8dd63ffdd5ac8256460e1fcb4bfd`
MD5	`83b8e337616de6f655a950f2dc7602e8`
BLAKE2b-256	`5ce3a63a0254aa8dee19d427e29f7ef2c63c61f8f400bac55a4702816faac38e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yolo_scout-1.1.4-py3-none-any.whl:

Publisher: cd.yml on picsalex/yolo-scout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yolo_scout-1.1.4-py3-none-any.whl
- Subject digest: b3f4a114b139415426fd285e64124f12ca1c8dd63ffdd5ac8256460e1fcb4bfd
- Sigstore transparency entry: 1172783994
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: picsalex/yolo-scout@3abf0907c7d8f7c6cb8d176004246a4053d1f19a
- Branch / Tag: refs/tags/v1.1.4
- Owner: https://github.com/picsalex
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@3abf0907c7d8f7c6cb8d176004246a4053d1f19a
- Trigger Event: push

yolo-scout 1.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔍 YoloScout — YOLO Dataset quality analysis tool

🚀 Quick Start

Installation

Basic Usage

Command-Line Arguments

📊 Supported tasks and image metadata

⭐️ Supported Models

Model Selection Guide

🌐 Supported data sources

🧩 Additional installed plugins

⚒️ Dataset Structure

Format 1: Type-First Structure

Format 2: Split-First Structure

⌨️ FiftyOne commands

🤝 Contributing

📜 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance