Skip to main content

An image feature extraction library for python based on WND-CHARM

Project description

CharmFeatures

CharmFeatures is a ctypes library, module and command-line utility for extracting Wnd-Charm image features from large collections of TIFF files. It can compute any subset of the full feature set, or new combinations of transforms and feature algorithms.

Dependencies

  • libtiff: sudo apt-get install libtiff-dev
  • fftw3: sudo apt-get install libfftw3-dev

N.B.: See note below on arm64 vs amd64 compatibnility

charm-image-features

Command-line utility based on CharmFeatures.ProcessImages for extracting charm features from large collections of tiff files and saving intermediates for later re-use.

This utility will consume all CPU resources available on the machine it is launched on. It does not use or benefit from GPUs. It can be launched multiple times on different machines on a cluster sharing the same file system in order to cooperatively compute large feature sets on a common directory tree of tiff files. This cooperative distributed multiprocessing requires POSIX-style file locking (sometimes called fcntl-style file locking).

Synopsis

charm-image-features -t4 -n -o test-t4.npz ../images

Recursively descend the ../images directory tree, calculating features for 4x4 tiles (-t4) of every TIFF file encountered, using directory names of tiff files as labels. Images are normalized to STDs (i.e. z-scores) prior to tiling (-n). Intermediate feature vectors are saved in numpy "npz" format alongside tiff files with sample names as keys (file/tile names) and feature vectors as values. The entire set of samples is assembled into the outfile test-t4.npz (-o) with samples, features and labels keys and values containing the sample name vector, feature matrix and label vector.

See charm-image-features -h for additional options.

ProcessImages

ProcessImages is a module designed for ease of integration between wnd-charm features and other AI/ML libraries (e.g. scikit-learn) for feature normalization, selection, classifier or regressor training, etc. Please note that 2895 features are computed per image sample in the current version, so a good feature selection strategy is paramount.

Scikit-learn integration example

from CharmFeatures import ProcessImages as pi

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import SelectKBest, mutual_info_classif

# Gather image features. Feature vectors will not be recomputed in subsequent calls with same parameters
ip = pi.ProcessImages(normalize=True, tile= (4,4), in_paths='images/', outfile='image-features-t4.npz')
# Conventional naming for feature matrix and label vector
X, y, grp = ip.features_mat, ip.labels, ip.groups

# Split dataset into test and train sets, stratified by images to avoid
# having any image exist in both training and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=grp)

clf_selected = make_pipeline(
        SelectKBest(mutual_info_classif, k=150), MinMaxScaler(),
        RandomForestClassifier(max_depth=25, n_estimators=75)
)
clf_selected.fit(X_train, y_train)
print('Classification accuracy: {:.3f}'.format(clf_selected.score(X_test, y_test)))

Other options

Separate calls for more control over processing. The constructor will not compute features automatically if no paths are specified.

from CharmFeatures import ProcessImages as pi
ip = pi.ProcessImages(normalize=True, tile= (4,4))

The process_paths() method can be used to filter out files or assign labels based on things other than directory names.

ip.process_paths (paths = ['../images/cond1','../images/cntrl','../images/cond2'], filters = ['_ch01'], labels = {'cntrl': ['cond1','cntrl'],'trtmnt':'cond2'})

Here two separate directories are traversed, any file containing _ch01 anywhere in its name or path is ignored. Files containing cond1 or cntrl anywhere in their name or path are assigned the cntrl label, and those containing cond2, the trtmnt label.

process_paths() can be called multiple times to accumulate files to process.

The sample name vector, feature matrix, label vector and group vector have to be gotten explicitly after process_paths():

(sn, X, y, gr) = ip.get_feature_matrix()

If necessary, the features will be computed in parallel in this call, using all available processors. Or, if feature vector files with the same parameters have been computed previously, they will be used instead.

CharmFeatures

Python ctypes interface to wnd-charm image features image features.

Synopsis:

from libtiff import TIFF
from CharmFeatures import CharmFeatures
import numpy as np
# Initialize computing of standard long feature vectors (2895 features in version 5)
cf = CharmFeatures()
# A contiguous 2D array of doubles will avoid copying
image = TIFF.open('foo.tiff').read_image().astype(np.double)
# Compute image features, returning a 1D numpy array of 2895 doubles.
fv = cf.get_features(image)
feature_names = cf.get_feature_names ()

Notes

  • feature_names are the fully expanded names of the features returned by the get_features() call. With few exceptions, each feature extraction algorithm produces multiple features in the vector, all following the '<feature_name> [d]' convention, where 'd' is an integer from 0-n.
  • get_features() can be supplied with a slice of a pre-allocated feature matrix using the featurevec parameter.
  • The get_features() method is meant to be called repeatedly on the same cf object with different images for efficiently computing features for many images.

Optional CharmFeatures() parameters

  • f_names can be used to specify a subset of feature extraction algorithms and transforms to run (all features produced by the specified algorithm+transform will be computed).

    cf = CharmFeatures (f_names=['Tamura Textures ()', 'Tamura Textures (Fourier (Edge ()))'])
    

    This will compute only Tamura Textures on the raw image and on a Fourier transform of an Edge transform of the raw image. The full set of default feature algorithms and transforms can be obtained from CharmFeatures().get_feature_names(). The list of strings supplied to f_names can be any combination of feature algorithms and transforms.

  • forking_executor, forking_gabor, and forking_haralick control forking (all true by default)

  • short parameter will produce a 1047-long feature vector if true

  • verbosity controls the amount of output written, from 0 to 7; default=2.

Docker

A Dockerfile and compose.yml is provided for portability and ease of deployment. The image produced (charmfeatures) does not specify an entry point, so the conatiner must be run with the charm-image-features command specified explicitly:

docker compose run -v /data/images:/data/images charmfeatures charm-image-features -n -t4 /data/images

This mounts the local folder /data/images inside the charmfeatures container and then uses the charm-image-features command to compute features for all of the image files found in there with normalization and 4x4 tiling.

Compatibility b/w arm64 and amd64

The features produced on arm64 architecture are not all bit-wise identical to those produced on amd64. For a fraction of the image tiles, a minority of the features (~1%) are identical to 4 significant figures, and a larger minority (~10%) is identical to 5 significant figures, while the majority (~90%) are bit-wise identical. The differing features involve those that use Fourier transforms. It is not believed that the differences will cause significant (or even observable) differences in AI performance as long as all of the features are computed on the same architecture. Because of the occasional differences, the feature vector version now contains '-arm64' when computed on arm architecture. Features computed on amd64 do not reflect this in the version string to maintain backward compatibility.

Remote image access

CharmFeatures supports flexible configuration classes to manage how images are retrieved and how .npy feature files are stored.
All remote access classes share the same API: init_worker(), and fetch_image().

RemoteS3Config (Remote access to AWS S3)

Used to connect to an AWS S3 bucket to fetch images and save derived features. Authentication using the user's AWS configuration & profiles or unauthenticated public S3 bucket access is possible. By default, feature files will be saved in a local directory tree mirroring the remote image directory tree. The root of this directory tree is the current working directory unless specified by the -f cli flag or features_root parameter to ProcessImages.

from CharmFeatures import ProcessImages as pi

# S3 paths to images
image_links = ['pub/example_image1.tiff','pub/example_image2.tiff']
# AWS S3 access settings.
# NB: specifying aws_access_key_id/aws_secret_access_key not currently implemented
remote_config = {
    'bucket_name' : 'myBucket',       # AWS S3 bucket name
    # public s3 buckets can be used by setting 'public' : True
    # if the profile is not specified, then the default profile will be used
    'profile' : 'myAWSProfile'        # usually in ~/.aws/config
}
rc = pi.RemoteS3Config(remote_config = remote_config)
ip = pi.ProcessImages(normalize=True, tile=(4,4), image_links=image_links, remote_config=rc)

Your own remote API

Your own API access can be implemented by inheriting from RemoteS3Config. This example uses a remote API based on a hypothetical FooSession from the MyRemoteAPI package.

Here, the hypothetical session object has a fetch_blob_as_numpy() method, but any type of blob/image fetching is possible including the use of local temporary file storage (as is done by RemoteS3Config). The init_worker() method is called internally once per parallelized worker in a worker poool to establish a remote session using settings in the remote_config dict, and store the resulting session object in the remote_session field. The fetch_image(), method is then called once per image, reusing the remote_session object.

from CharmFeatures import ProcessImages as pi
from MyRemoteAPI import FooSession # Note this doesn't actually exist
import numpy as np
# inherit from RemoteS3Config
class RemoteFooConfig (pi.RemoteS3Config):
    # The super().__init__() method just copies the provided configuration dict
    # to the class remote_config field. No need to override it normally.
    # The remote_config dict must be picklable, so it should just contain
    # simple strings to initiate the session.
    def init_worker (self):
        # initiate a session using values stored in the remote_config dict
        self.remote_session = FooSession (self.remote_config['foo_profile'])
        return (self.remote_session)
    def fetch_image (self, path):
        # The numpy matrix returned should be a 2D matrix of doubles
        img_mat = self.remote_session.fetch_blob_as_numpy (path, np_type = np.double)
        return (img_mat)
# initialize the parameters necessary to establish a remote session
rc = RemoteFooConfig(remote_config = {'foo_profile' : 'myProfileName'})
ip = pi.ProcessImages(normalize=True, tile=(4,4), image_links=image_links, remote_config=rc)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charmfeatures-1.2.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

charmfeatures-1.2.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.2.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.2.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.2.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.2.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.2.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.2.0-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.2.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.2.0-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.2.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file charmfeatures-1.2.0.tar.gz.

File metadata

  • Download URL: charmfeatures-1.2.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for charmfeatures-1.2.0.tar.gz
Algorithm Hash digest
SHA256 4e4c37c5ef02ca1f74beef43a1cd5e0ea52017d287ccb6d5641cf548daf5093b
MD5 cfd716090ea592f1da3c1e193fddb30c
BLAKE2b-256 a3fc0635ef9869f871ea99a3da26f37ebd2c8eb4fb73b71c58743af09193112f

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9389290bf64cb5e6b1940a7e4ad043729cedea5646e3cffd5000a28923b70cdc
MD5 2a8aa1878b26008385e0b364c7c09757
BLAKE2b-256 ee577b14619bfbda4703451f5e806ee37842c989c453115ab6e216defe9e82d2

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 51c157fbaa4da7b935fa1a4b2efd75f79438f76bfbcc998d9b455d7f1b8fb54d
MD5 75b75687396f28be18d3b5132662d839
BLAKE2b-256 85962d0db5ad51d0edd585058e93992fb499aeb1e0d5e8c0d1bc17f36932101b

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b0419bde9bdfdfbce37f985f017198b463791bec78bc4a7daf1d6ab85c626719
MD5 2fa0626b3d5c455b7d293cf9491afe86
BLAKE2b-256 1ca8b15eef5db30af13d2005e94a8d5df62bbce4743b9295c308271b14fae916

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a421c8b1f67e7682d2f43b9e775b62e3fa002378a0f97e759d49ca41568786e7
MD5 06564fdc2f6c8abed2750e8d8d6c3cd3
BLAKE2b-256 2ba4bfe3bf300610ccd08f79a3f400665e14a195506fde459407843c001f4060

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 75a4d65e7daac5624c5fb5cbaa1efddf660b9a121ae756faddfbd64fe102889a
MD5 f0a5b89c4fcc699e8e127f4aba65b6a1
BLAKE2b-256 460ef3ec7c21da8889723c4fc90be4d180bec9da4ebbbdcd36fc59c75c635e96

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 996e354807a32b6a00d31b356442abb320d149dd32656656648963644468dc3b
MD5 9f880f78fa27bdb290390c49ce25e895
BLAKE2b-256 a8f0ef08bda4ac9a99cb12a1e3a50dfa38698de647e4aa0a854e94edf675ad57

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8574c9f293703de40feb5ab67f5c1e9e2d1dddb40fe534c6b2678b0896df2d43
MD5 1cd0e92ec5fc39679eee037b5ef4d563
BLAKE2b-256 5ee849186cd789253497689d17256e753cea13a1f64ed2d968872a6139292bdc

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3e5a278d0afc4b5fcb90d1a24dc847b59d6e37183f61887078f0f7c508ea9499
MD5 05e74c3b4bb8503d3c36dc63453ad645
BLAKE2b-256 3cb5dc847bdf7bfce647de42da6997a7b702f9e6efb2657fd8bcd994dac2c29c

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e5fabcb9b54d1ae9536a726f3da093f1895c69b6bdf83ade93f5091729635305
MD5 e03ca5bff1dd305a4f7f60d0059c1c61
BLAKE2b-256 6256a53461c98dad9e05b1f16beb53cd508aac3da90cc09b780d9a04371742e5

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1f2a21df2816846acd6176dd8fd0a080bfe01f935c46aad2b70c9d286a17c7bb
MD5 6f3fedd390296b9ba67bd7d8ff74a21e
BLAKE2b-256 f61c9893f0c4c1e94e5e8ca7fe73ee2421e8d14fd25e4837816a52b874bf6680

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5eaa1e90538b667836316a020b01608ddc4fff396866546c197ca512708b6abd
MD5 3479f903fbfa76f494457ac421329a13
BLAKE2b-256 0f08d72ab61a5a321f24434d3d60dd14f0e787077bae11ff532320606871c13e

See more details on using hashes here.

File details

Details for the file charmfeatures-1.2.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.2.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 95d0ed01ec4d8155d4ef725765e89325f438ea3c4dd8342c539daac9b04a0036
MD5 584427987d45c02f2133440ac2ee96f2
BLAKE2b-256 7ef63aa83836064158b88c69bd4d86e6fb06d93f2897c4759bdaae95bb3359ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page