An image feature extraction library for python based on WND-CHARM
Project description
CharmFeatures
CharmFeatures is a ctypes library, python module and command-line utility for extracting Wnd-Charm image features from large collections of TIFF files using massively parallel cooperative distributed multitasking. It can compute any subset of the full feature set, or new combinations of transforms and feature algorithms, and it can access image files remotely (e.g. from AWS S3).
Dependencies
Three external libraries: libtiff, libfftw, and libhdf5
- Debian, Ubuntu, etc:
sudo apt install -y libtiff-dev libfftw3-dev libhdf5-dev - RedHat, CentOS, Fedora:
sudo yum install -y libtiff-devel fftw-devel hdf5-devel
N.B.: See note below on arm64 vs amd64 compatibnility
charm-image-features
Command-line utility based on CharmFeatures.ProcessImages for extracting charm features from directories
of local or remote tiff files while saving the feature files locally for later re-use.
This utility will consume all CPU resources available on the machine it is launched on. It does not use or benefit from GPUs. It can be launched multiple times on different machines sharing the same file system to compute large feature sets on a common directory structure of tiff files, which may be stored on the shared filesystem or accessed remotely. This uncoordinated distributed processing requires the shared filesystem to implement POSIX-style file locking (sometimes called fcntl-style file locking).
Synopsis
charm-image-features -t4 -n -o test-t4.npz ../images
Recursively descend the ../images directory tree, calculating features for 4x4 tiles (-t4) of every TIFF file
encountered, using directory names of tiff files as labels. Images are normalized to STDs (i.e. z-scores) prior to
tiling (-n). Per-image feature vectors are saved in numpy "npz" format alongside tiff files with sample names
as keys (file/tile names) and feature vectors as values. The entire set of samples is assembled into the outfile
test-t4.npz (-o) with samples, features and labels keys and values containing the sample name vector,
feature matrix and label vector.
See charm-image-features -h for additional options.
ProcessImages
ProcessImages is a module designed for ease of integration between wnd-charm features and other AI/ML libraries (e.g. scikit-learn) for feature normalization, selection, classifier or regressor training, etc. Please note that by default, 2895 features are computed per image sample in the current version, so a good feature selection strategy is paramount.
Scikit-learn integration example
from CharmFeatures import ProcessImages as pi
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import SelectKBest, mutual_info_classif
# Gather image features. Feature vectors will not be recomputed in subsequent calls with same parameters
ip = pi.ProcessImages(normalize=True, tile= (4,4), in_paths='images/', outfile='image-features-t4.npz')
# Conventional naming for feature matrix and label vector
X, y, grp = ip.features_mat, ip.labels, ip.groups
# Split dataset into test and train sets, stratified by images to avoid
# having any image exist in both training and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=grp)
clf_selected = make_pipeline(
SelectKBest(mutual_info_classif, k=150), MinMaxScaler(),
RandomForestClassifier(max_depth=25, n_estimators=75)
)
clf_selected.fit(X_train, y_train)
print('Classification accuracy: {:.3f}'.format(clf_selected.score(X_test, y_test)))
Other options
Separate calls for more control over processing. The constructor will not compute features automatically if no paths are specified.
from CharmFeatures import ProcessImages as pi
ip = pi.ProcessImages(normalize=True, tile= (4,4))
The process_paths() method can be used to filter out files or assign labels based on things other than
directory names.
ip.process_paths (paths = ['../images/cond1','../images/cntrl','../images/cond2'],
filters = ['_ch01'], labels = {'cntrl': ['cond1','cntrl'],'trtmnt':'cond2'}
)
Here two separate directories are traversed, any file containing _ch01 anywhere in its name or path is
ignored. Files containing cond1 or cntrl anywhere in their name or path are assigned the cntrl label,
and those containing cond2, the trtmnt label.
process_paths() can be called multiple times to accumulate files to process.
The sample name vector, feature matrix, label vector and group vector have to be gotten explicitly
after process_paths():
(sn, X, y, gr) = ip.get_feature_matrix()
If necessary, the features will be computed in parallel in this call, using all available processors. Or, if feature vector files with the same parameters have been computed previously, they will be used instead.
CharmFeatures
Python ctypes interface to wnd-charm image features image features.
Synopsis:
from libtiff import TIFF
from CharmFeatures import CharmFeatures
import numpy as np
# Initialize computing of standard long feature vectors (2895 features in version 5)
cf = CharmFeatures()
# A contiguous 2D array of doubles will avoid copying
image = TIFF.open('foo.tiff').read_image().astype(np.double)
# Compute image features, returning a 1D numpy array of 2895 doubles.
# N.B.: If any image pre-processing is done, better numerical stability
# is achieved by down-casting to float32 prior to feature computation
fv = cf.get_features(image.astype(np.float32))
feature_names = cf.get_feature_names ()
Notes
feature_namesare the fully expanded names of the features returned by the get_features() call (orpython -m CharmFeatures.ProcessImages --feature-names-full). With few exceptions, each feature extraction algorithm produces multiple features in the vector, all following the<feature_name> [d]convention, where 'd' is an integer from 0-n.get_features()can be supplied with a slice of a pre-allocated feature matrix using thefeaturevecparameter.- The
get_features()method is meant to be called repeatedly on the samecfobject with different images for efficiently computing features for many images.
Optional CharmFeatures() parameters
-
f_namescan be used to specify a subset of feature extraction algorithms and transforms to run. All features produced by the specified algorithm+transform will be computed.cf = CharmFeatures (f_names=[ 'Tamura Textures ()', 'Tamura Textures (Edge (Fourier ()))' ])
This will compute only Tamura Textures on the raw image and on an Edge transform of a Fourier transform of the raw image. The full set of default feature algorithms and transforms can be obtained from
charm-image-features --feature-names. The list of strings supplied to f_names can be any combination of feature algorithms and transforms, even if that combination does not exist in the standard feature set (there is no Tamura Textures (Edge (Fourier ())) in the default feature set). -
forking_executor,forking_gabor, andforking_haralickcontrol forking (alltrueby default) -
shortparameter will produce a 1047-long feature vector if true -
verbositycontrols the amount of output written, from 0 to 7; default=2.
Docker
A Dockerfile and compose.yml is provided for portability and ease of deployment. The image produced (charmfeatures) does not specify an entry point, so the conatiner must be run with the charm-image-features command specified explicitly:
docker compose run -v /data/images:/data/images charmfeatures charm-image-features -n -t4 /data/images
This mounts the local folder /data/images inside the charmfeatures container and then uses the charm-image-features command to compute features for all of the image files found in there with normalization and 4x4 tiling.
Compatibility b/w arm64 and amd64
The features produced on arm64 architecture are not all bit-wise identical to those produced on amd64. For a fraction of the image tiles, a minority of the features (~1%) are identical to 4 significant figures, and a larger minority (~10%) is identical to 5 significant figures, while the majority (~90%) are bit-wise identical. The differing features involve those that use Fourier transforms. It is not believed that the differences will cause significant (or even observable) differences in AI performance as long as all of the features are computed on the same architecture. Because of the occasional differences, the feature vector version now contains '-arm64' when computed on arm architecture. Features computed on amd64 do not reflect this in the version string to maintain backward compatibility.
Remote image access
CharmFeatures supports flexible configuration classes to manage how images are retrieved and how .npy feature files are stored.
All remote access classes share the same API: init_worker(), and fetch_image().
RemoteS3Config (Remote access to AWS S3)
Used to connect to an AWS S3 bucket to fetch images and save derived features.
Authentication using the user's AWS configuration & profiles or unauthenticated public S3 bucket access is possible.
By default, feature files will be saved in a local directory tree mirroring the remote image directory tree. The root of this
directory tree is the current working directory unless specified by the -f cli flag or features_root parameter to ProcessImages.
from CharmFeatures import ProcessImages as pi
# S3 paths to images
image_links = ['pub/example_image1.tiff','pub/example_image2.tiff']
# AWS S3 access settings.
# NB: specifying aws_access_key_id/aws_secret_access_key not currently implemented
remote_config = {
'bucket_name' : 'myBucket', # AWS S3 bucket name
# public s3 buckets can be used by setting 'public' : True
# if the profile is not specified, then the default profile will be used
'profile' : 'myAWSProfile' # usually in ~/.aws/config
}
rc = pi.RemoteS3Config(remote_config = remote_config)
ip = pi.ProcessImages(normalize=True, tile=(4,4), image_links=image_links, remote_config=rc)
Your own remote API
Your own API access can be implemented by inheriting from RemoteS3Config.
This example uses a remote API based on a hypothetical FooSession from the MyRemoteAPI package.
Here, the hypothetical session object has a fetch_blob_as_numpy() method, but any type of blob/image fetching
is possible including the use of local temporary file storage (as is done by RemoteS3Config).
The init_worker() method is called internally once per parallelized worker in a worker poool to establish
a remote session using settings in the remote_config dict, and store the resulting session object
in the remote_session field. The fetch_image(), method is then called once per image, reusing the remote_session object.
from CharmFeatures import ProcessImages as pi
from MyRemoteAPI import FooSession # Note this doesn't actually exist
import numpy as np
# inherit from RemoteS3Config
class RemoteFooConfig (pi.RemoteS3Config):
# The super().__init__() method just copies the provided configuration dict
# to the class remote_config field. No need to override it normally.
# The remote_config dict must be picklable, so it should just contain
# simple strings to initiate the session.
def init_worker (self):
# initiate a session using values stored in the remote_config dict
self.remote_session = FooSession (self.remote_config['foo_profile'])
return (self.remote_session)
def fetch_image (self, path):
# The numpy matrix returned should be a 2D matrix of doubles
img_mat = self.remote_session.fetch_blob_as_numpy (path, np_type = np.double)
return (img_mat)
# initialize the parameters necessary to establish a remote session
rc = RemoteFooConfig(remote_config = {'foo_profile' : 'myProfileName'})
ip = pi.ProcessImages(normalize=True, tile=(4,4), image_links=image_links, remote_config=rc)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file charmfeatures-1.3.1.tar.gz.
File metadata
- Download URL: charmfeatures-1.3.1.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a176bd75fb2a57da5e4af301da5ce5fb6209c7a6f644d79bac0652d8f8c7d99b
|
|
| MD5 |
faee5469d829e55848b89f52fad1fed2
|
|
| BLAKE2b-256 |
cf5d61222caa166c0f2d74986b7ccd2039b1e894720651600e084795e0bf5c2e
|
File details
Details for the file charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 8.7 MB
- Tags: CPython 3.13, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa91baf62a637aaf38074d8dde7c96cce013985ff8728a9db8f201474750e862
|
|
| MD5 |
ebf82b1ced8165fd6499909a2baaafc8
|
|
| BLAKE2b-256 |
8e06279542062c775c5e0214f2a8629d9c1d143f147534efdbd6b82410cd8efb
|
File details
Details for the file charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.24+ ARM64, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdaa7fb7bf85b527181384ce7148c399b771e9ab53c8e06a61c2b2db22037110
|
|
| MD5 |
c720b2db3878076bb0aa58f172a62db6
|
|
| BLAKE2b-256 |
ebf33620e15fdd4f15ae09fc533332a9dcf33b7bf06331ee9096da6e7013ec4a
|
File details
Details for the file charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 8.7 MB
- Tags: CPython 3.12, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6ce97e7c25c5c311ec327ed5aec596495467ee973a185bea0b62847755ec889
|
|
| MD5 |
8b71f7307ceec576cdacf7b2911cb9e1
|
|
| BLAKE2b-256 |
4b4358b8098157abed4e97f8635f39e17781dc9fccb5c7db118fe58012f640fb
|
File details
Details for the file charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.24+ ARM64, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f24ee3490b24224abc32f5f04ba79b26af319004fcb38c0e95e3e474f5eee5e
|
|
| MD5 |
36fc7ec9fea2962d5df63eaa70f9cfd1
|
|
| BLAKE2b-256 |
c43fd3b670202e5b24748fa4fc7542d6bf1a068e4033d9f58a2f2a31d46a7774
|
File details
Details for the file charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 8.7 MB
- Tags: CPython 3.11, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51abbb83b71262e7b375b141cf104bacbcbe5ac54301cdc3b2a86635bccffda0
|
|
| MD5 |
0893b6e40be0e0c8dd89aa10089ff4ba
|
|
| BLAKE2b-256 |
5781a1b574967ac97466e807bbb3da04dd6e167375a1009fb782303b9b0cb0d1
|
File details
Details for the file charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.24+ ARM64, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97feec436c4025a01d7eb44edff6e2c0cce828affe12d33ab47e71c4a897e7d2
|
|
| MD5 |
2db6268f4b0256cd68aca6901c41136b
|
|
| BLAKE2b-256 |
eaff6f95d44ed8d934ec0091a76fc3ec44036349ccbe1afc539d8c1fc52823ea
|
File details
Details for the file charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 8.7 MB
- Tags: CPython 3.10, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5e0b7baba841f6c84e110174528831e4d10b92358e2de18908990351a167544
|
|
| MD5 |
42382ec45cd3d7d66a795dc6f24e340e
|
|
| BLAKE2b-256 |
4dea6b177dc00a6fe4ff15badecf08338743a1f23746231802f13a02494a576c
|
File details
Details for the file charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.
File metadata
- Download URL: charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.24+ ARM64, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a3b50f9bdb6c4cc11b9590e64a5abefc2d12c63db285be0e9a60842b0b4281c
|
|
| MD5 |
af3e61e55a0ff5f6a2d3c443932c3ec4
|
|
| BLAKE2b-256 |
a6c311e1a8bb77a088a1f6b9563d6db5f8d7657d31ab1a1242213a997efa2ffe
|