Skip to main content

Find issues in image datasets

Project description

Screen Shot 2023-03-10 at 10 23 33 AM

CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset!

Read the Docs pypi os py_versions codecov

Installation

pip install cleanvision

Quickstart

Download an example dataset (optional). Or just use any collection of image files you have.

wget -nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'

Run CleanVision to audit the images.

from cleanvision.imagelab import Imagelab

# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="FOLDER_WITH_IMAGES/")

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()

CleanVision diagnoses many types of issues, but you can also check for only specific issues.

issue_types = {"dark": {}, "blurry": {}}

imagelab.find_issues(issue_types=issue_types)

# Produce a report with only the specified issue_types
imagelab.report(issue_types=issue_types)

More resources on how to use CleanVision

Clean your data for better Computer Vision

The quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. CleanVision helps you automatically identify common types of data issues lurking in image datasets.

This package currently detects issues in the raw images themselves, making it a useful tool for any computer vision task such as: classification, segmentation, object detection, pose estimation, keypoint detection, generative modeling, etc. To detect issues in the labels of your image data, you can instead use the cleanlab package.

In any collection of image files (most formats supported), CleanVision can detect the following types of issues:

Issue Type Description Issue Key Example
1 Exact Duplicates Images that are identical to each other exact_duplicates
2 Near Duplicates Images that are visually almost identical near_duplicates
3 Blurry Images where details are fuzzy (out of focus) blurry
4 Low Information Images lacking content (little entropy in pixel values) low_information
5 Dark Irregularly dark images (underexposed) dark
6 Light Irregularly bright images (overexposed) light
7 Grayscale Images lacking color grayscale
8 Odd Aspect Ratio Images with an unusual aspect ratio (overly skinny/wide) odd_aspect_ratio

This package is still a work in progress, so expect sharp edges. Feel free to submit any found bugs or desired functionality as an issue!

CleanVision supports Linux, macOS, and Windows and runs on Python 3.7+.

Join our community

License

Copyright (c) 2022 Cleanlab Inc.

cleanvision is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

cleanvision is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See GNU Affero General Public LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanvision-0.1.1.tar.gz (68.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleanvision-0.1.1-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file cleanvision-0.1.1.tar.gz.

File metadata

  • Download URL: cleanvision-0.1.1.tar.gz
  • Upload date:
  • Size: 68.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for cleanvision-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a725e29e709822f2e5df7f13da1d29925ad775610436b658998ee40eea4e60be
MD5 51a568657f166ad4dda70952aeb163ae
BLAKE2b-256 12e8987e7b41e56ea17bdae4cd4120d799f2de493be1926b562d8420d61d3a76

See more details on using hashes here.

File details

Details for the file cleanvision-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cleanvision-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for cleanvision-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e2d17b7de31c5f41cdc2294a816b7678c39bef176cf506a2204fbcec2685632
MD5 6b20385c10a559d202fec64a45788a10
BLAKE2b-256 f059a0bb9736e93f6d7c96df1078c5abf8fd3aed0d92fecaa59f0e928ca3b754

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page