Python package clustimage is for unsupervised clustering of images.
Project description
clustimage
- clustimage is a python package for unsupervised clustering of images.
The number of avaialble images has become huge over time for which (deep) neural networks are ideal for predictive purposes.
However, it can be quit a complex approach to grouping a set of images that are highly similar in an unsupervised manner, or to identify the "unique" images in a directory.
With
clustimage
I want to overcome these challanges and created a generic approach to unsupervised cluster images.
clustimage is fun because:
- it does not require a learning proces.
- it can group any set of images.
- It can return only the unique() images.
- Many many plots to improve understanding of the feature-space and sample-sample relationships
Installation
- Install clustimage from PyPI (recommended). clustimage is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- A new environment can be created as following:
conda create -n env_clustimage python=3.8
conda activate env_clustimage
- Install from pypi
pip install -U clustimage
Import the clustimage package
from clustimage import Clustimage
Example 1: Digit images.
In this example we will be using a flattened grayscale image array loaded from sklearn. The array in NxM, where N are the samples and M the flattened raw rgb/gray image.
# Load library
import matplotlib.pyplot as plt
from clustimage import Clustimage
# init
cl = Clustimage()
# Load example digit data
X = cl.import_example(data='digits')
print(X)
# Each row is an image that can be plotted after reshaping:
plt.imshow(X[0,:].reshape(8,8), cmap='binary')
# array([[ 0., 0., 5., ..., 0., 0., 0.],
# [ 0., 0., 0., ..., 10., 0., 0.],
# [ 0., 0., 0., ..., 16., 9., 0.],
# ...,
# [ 0., 0., 0., ..., 9., 0., 0.],
# [ 0., 0., 0., ..., 4., 0., 0.],
# [ 0., 0., 6., ..., 6., 0., 0.]])
#
# Preprocessing and feature extraction
results = cl.fit_transform(X)
# Lets examine the results.
print(results.keys())
# ['feat', 'xycoord', 'pathnames', 'filenames', 'labels']
#
# feat : Extracted features
# xycoord : Coordinates of samples in the embedded space.
# filenames : Name of the files
# pathnames : Absolute location of the files
# labels : Cluster labels in the same order as the input
# Get the unique images
unique_samples = cl.unique()
#
print(unique_samples.keys())
# ['labels', 'idx', 'xycoord_center', 'pathnames']
#
# Collect the unique images from the input
X[unique_samples['idx'],:]
Plot the unique images.
cl.plot_unique()
Scatter samples based on the embedded space.
# The scatterplot that is coloured on the clusterlabels. The clusterlabels should match the unique labels.
# Cluster 1 contains digit 4
# Cluster 5 contains digit 2
# etc
#
# No images in scatterplot
cl.scatter(zoom=None)
# Include images scatterplot
cl.scatter(zoom=4)
Plot the clustered images
# Plot all images per cluster
cl.plot(cmap='binary')
# Plot the images in a specific cluster
cl.plot(cmap='binary', labels=[1,5])
Dendrogram
# The dendrogram is based on the high-dimensional feature space.
cl.dendrogram()
Make various other plots
# Plot the explained variance
cl.pca.plot()
# Make scatter plot of PC1 vs PC2
cl.pca.scatter(legend=False, label=False)
# Plot the evaluation of the number of clusters
cl.clusteval.plot()
# Make silhouette plot
cl.clusteval.scatter(cl.results['xycoord'])
Example 2: Flower images.
In this example I will be using flower images for which the path locations are somewhere on disk.
# Load library
from clustimage import Clustimage
# init
cl = Clustimage(method='pca')
# load example with flowers
pathnames = cl.import_example(data='flowers')
# The pathnames are stored in a list
print(pathnames[0:2])
# ['C:\\temp\\flower_images\\0001.png', 'C:\\temp\\flower_images\\0002.png']
# Preprocessing, feature extraction and clustering. Lets set a minimum of 1-
results = cl.fit_transform(pathnames)
# Lets first evaluate the number of detected clusters.
# This looks pretty good because there is a high distinction between the peak for 5 clusters and the number of clusters that subsequently follow.
cl.clusteval.plot()
cl.clusteval.scatter(cl.results['xycoord'])
Scatter
cl.scatter(dotsize=50, zoom=None)
cl.scatter(dotsize=50, zoom=0.5)
cl.scatter(dotsize=50, zoom=0.5, img_mean=False)
Plot the clustered images
# Plot unique images
cl.plot_unique()
cl.plot_unique(img_mean=False)
# Plot all images per cluster
cl.plot()
# Plot the images in a specific cluster
cl.plot(labels=3)
# Plot dendrogram
cl.dendrogram()
# Plot clustered images
cl.plot()
Make prediction for unseen input image.
# Find images that are significanly similar as the unseen input image.
results_find = cl.find(path_to_imgs[0:2], alpha=0.05)
cl.plot_find()
# Map the unseen images in existing feature-space.
cl.scatter()
Example 3: Cluster the faces on images.
from clustimage import Clustimage
# Initialize with grayscale and extract HOG features.
cl = Clustimage(method='hog', grayscale=True)
# Load example with faces
pathnames = cl.import_example(data='faces')
# First we need to detect and extract the faces from the images
face_results = cl.detect_faces(pathnames)
# The detected faces are extracted and stored in face_resuls. We can now easily provide the pathnames of the faces that are stored in pathnames_face.
results = cl.fit_transform(face_results['pathnames_face'])
# Plot the evaluation of the number of clusters. As you can see, the maximum number of cluster evaluated is 24 can perhaps be too small.
cl.clusteval.plot()
# Lets increase the maximum number and clusters and run solely the clustering. Note that you do not need to fit_transform() anymore. You can only do the clustering now.
cl.cluster(max_clust=35)
# And plot again. As you can see, it keeps increasing which means that it may not found any local maximum anymore.
# When looking at the graph, we see a local maximum at 12 clusters. Lets go for that
cl.cluster(min_clust=12, max_clust=13)
# Lets plot the 12 unique clusters that contain the faces
cl.plot_unique()
# Scatter
cl.scatter(zoom=None)
cl.scatter(zoom=0.2)
# Make plot
cl.plot(show_hog=True, labels=[1,7])
# Plot faces
cl.plot_faces()
# Dendrogram depicts the clustering of the faces
cl.dendrogram()
References
Citation
Please cite in your publications if this is useful for your research (see citation).
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- All kinds of contributions are welcome!
- If you wish to buy me a Coffee for this work, it is very appreciated :)
Licence
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
clustimage-1.2.1.tar.gz
(26.0 kB
view hashes)
Built Distribution
clustimage-1.2.1-py3-none-any.whl
(24.3 kB
view hashes)
Close
Hashes for clustimage-1.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a3e2d3e6dc85fe2c8eb0370b6d24358986dbff26bb9d6bf7cb418434121d63e |
|
MD5 | 464c473b06e87dacdff0208fd74a702d |
|
BLAKE2b-256 | 6c42cf1357892c0d8217aa85754c8ec8e89caca21bf54a4a84b877481de0d3c3 |