Skip to main content

An image feature extraction library for python based on WND-CHARM

Project description

CharmFeatures

CharmFeatures is a ctypes library, python module and command-line utility for extracting Wnd-Charm image features from large collections of TIFF files using massively parallel cooperative distributed multitasking. It can compute any subset of the full feature set, or new combinations of transforms and feature algorithms, and it can access image files remotely (e.g. from AWS S3).

Dependencies

Three external libraries: libtiff, libfftw, and libhdf5

  • Debian, Ubuntu, etc: sudo apt install -y libtiff-dev libfftw3-dev libhdf5-dev
  • RedHat, CentOS, Fedora: sudo yum install -y libtiff-devel fftw-devel hdf5-devel

N.B.: See note below on arm64 vs amd64 compatibnility

charm-image-features

Command-line utility based on CharmFeatures.ProcessImages for extracting charm features from directories of local or remote tiff files while saving the feature files locally for later re-use.

This utility will consume all CPU resources available on the machine it is launched on. It does not use or benefit from GPUs. It can be launched multiple times on different machines sharing the same file system to compute large feature sets on a common directory structure of tiff files, which may be stored on the shared filesystem or accessed remotely. This uncoordinated distributed processing requires the shared filesystem to implement POSIX-style file locking (sometimes called fcntl-style file locking).

Synopsis

charm-image-features -t4 -n -o test-t4.npz ../images

Recursively descend the ../images directory tree, calculating features for 4x4 tiles (-t4) of every TIFF file encountered, using directory names of tiff files as labels. Images are normalized to STDs (i.e. z-scores) prior to tiling (-n). Per-image feature vectors are saved in numpy "npz" format alongside tiff files with sample names as keys (file/tile names) and feature vectors as values. The entire set of samples is assembled into the outfile test-t4.npz (-o) with samples, features and labels keys and values containing the sample name vector, feature matrix and label vector.

See charm-image-features -h for additional options.

ProcessImages

ProcessImages is a module designed for ease of integration between wnd-charm features and other AI/ML libraries (e.g. scikit-learn) for feature normalization, selection, classifier or regressor training, etc. Please note that by default, 2895 features are computed per image sample in the current version, so a good feature selection strategy is paramount.

Scikit-learn integration example

from CharmFeatures import ProcessImages as pi

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import SelectKBest, mutual_info_classif

# Gather image features. Feature vectors will not be recomputed in subsequent calls with same parameters
ip = pi.ProcessImages(normalize=True, tile= (4,4), in_paths='images/', outfile='image-features-t4.npz')
# Conventional naming for feature matrix and label vector
X, y, grp = ip.features_mat, ip.labels, ip.groups

# Split dataset into test and train sets, stratified by images to avoid
# having any image exist in both training and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=grp)

clf_selected = make_pipeline(
        SelectKBest(mutual_info_classif, k=150), MinMaxScaler(),
        RandomForestClassifier(max_depth=25, n_estimators=75)
)
clf_selected.fit(X_train, y_train)
print('Classification accuracy: {:.3f}'.format(clf_selected.score(X_test, y_test)))

Other options

Separate calls for more control over processing. The constructor will not compute features automatically if no paths are specified.

from CharmFeatures import ProcessImages as pi
ip = pi.ProcessImages(normalize=True, tile= (4,4))

The process_paths() method can be used to filter out files or assign labels based on things other than directory names.

ip.process_paths (paths = ['../images/cond1','../images/cntrl','../images/cond2'],
    filters = ['_ch01'], labels = {'cntrl': ['cond1','cntrl'],'trtmnt':'cond2'}
)

Here two separate directories are traversed, any file containing _ch01 anywhere in its name or path is ignored. Files containing cond1 or cntrl anywhere in their name or path are assigned the cntrl label, and those containing cond2, the trtmnt label.

process_paths() can be called multiple times to accumulate files to process.

The sample name vector, feature matrix, label vector and group vector have to be gotten explicitly after process_paths():

(sn, X, y, gr) = ip.get_feature_matrix()

If necessary, the features will be computed in parallel in this call, using all available processors. Or, if feature vector files with the same parameters have been computed previously, they will be used instead.

CharmFeatures

Python ctypes interface to wnd-charm image features image features.

Synopsis:

from libtiff import TIFF
from CharmFeatures import CharmFeatures
import numpy as np
# Initialize computing of standard long feature vectors (2895 features in version 5)
cf = CharmFeatures()
# A contiguous 2D array of doubles will avoid copying
image = TIFF.open('foo.tiff').read_image().astype(np.double)
# Compute image features, returning a 1D numpy array of 2895 doubles.
# N.B.: If any image pre-processing is done, better numerical stability
# is achieved by down-casting to float32 prior to feature computation
fv = cf.get_features(image.astype(np.float32))
feature_names = cf.get_feature_names ()

Notes

  • feature_names are the fully expanded names of the features returned by the get_features() call (or python -m CharmFeatures.ProcessImages --feature-names-full). With few exceptions, each feature extraction algorithm produces multiple features in the vector, all following the <feature_name> [d] convention, where 'd' is an integer from 0-n.
  • get_features() can be supplied with a slice of a pre-allocated feature matrix using the featurevec parameter.
  • The get_features() method is meant to be called repeatedly on the same cf object with different images for efficiently computing features for many images.

Optional CharmFeatures() parameters

  • f_names can be used to specify a subset of feature extraction algorithms and transforms to run. All features produced by the specified algorithm+transform will be computed.

    cf = CharmFeatures (f_names=[
        'Tamura Textures ()', 'Tamura Textures (Edge (Fourier ()))'
    ])
    

    This will compute only Tamura Textures on the raw image and on an Edge transform of a Fourier transform of the raw image. The full set of default feature algorithms and transforms can be obtained from charm-image-features --feature-names. The list of strings supplied to f_names can be any combination of feature algorithms and transforms, even if that combination does not exist in the standard feature set (there is no Tamura Textures (Edge (Fourier ())) in the default feature set).

  • forking_executor, forking_gabor, and forking_haralick control forking (all true by default)

  • short parameter will produce a 1047-long feature vector if true

  • verbosity controls the amount of output written, from 0 to 7; default=2.

Docker

A Dockerfile and compose.yml is provided for portability and ease of deployment. The image produced (charmfeatures) does not specify an entry point, so the conatiner must be run with the charm-image-features command specified explicitly:

docker compose run -v /data/images:/data/images charmfeatures charm-image-features -n -t4 /data/images

This mounts the local folder /data/images inside the charmfeatures container and then uses the charm-image-features command to compute features for all of the image files found in there with normalization and 4x4 tiling.

Compatibility b/w arm64 and amd64

The features produced on arm64 architecture are not all bit-wise identical to those produced on amd64. For a fraction of the image tiles, a minority of the features (~1%) are identical to 4 significant figures, and a larger minority (~10%) is identical to 5 significant figures, while the majority (~90%) are bit-wise identical. The differing features involve those that use Fourier transforms. It is not believed that the differences will cause significant (or even observable) differences in AI performance as long as all of the features are computed on the same architecture. Because of the occasional differences, the feature vector version now contains '-arm64' when computed on arm architecture. Features computed on amd64 do not reflect this in the version string to maintain backward compatibility.

Remote image access

CharmFeatures supports flexible configuration classes to manage how images are retrieved and how .npy feature files are stored.
All remote access classes share the same API: init_worker(), and fetch_image().

RemoteS3Config (Remote access to AWS S3)

Used to connect to an AWS S3 bucket to fetch images and save derived features. Authentication using the user's AWS configuration & profiles or unauthenticated public S3 bucket access is possible. By default, feature files will be saved in a local directory tree mirroring the remote image directory tree. The root of this directory tree is the current working directory unless specified by the -f cli flag or features_root parameter to ProcessImages.

from CharmFeatures import ProcessImages as pi

# S3 paths to images
image_links = ['pub/example_image1.tiff','pub/example_image2.tiff']
# AWS S3 access settings.
# NB: specifying aws_access_key_id/aws_secret_access_key not currently implemented
remote_config = {
    'bucket_name' : 'myBucket',       # AWS S3 bucket name
    # public s3 buckets can be used by setting 'public' : True
    # if the profile is not specified, then the default profile will be used
    'profile' : 'myAWSProfile'        # usually in ~/.aws/config
}
rc = pi.RemoteS3Config(remote_config = remote_config)
ip = pi.ProcessImages(normalize=True, tile=(4,4), image_links=image_links, remote_config=rc)

Your own remote API

Your own API access can be implemented by inheriting from RemoteS3Config. This example uses a remote API based on a hypothetical FooSession from the MyRemoteAPI package.

Here, the hypothetical session object has a fetch_blob_as_numpy() method, but any type of blob/image fetching is possible including the use of local temporary file storage (as is done by RemoteS3Config). The init_worker() method is called internally once per parallelized worker in a worker poool to establish a remote session using settings in the remote_config dict, and store the resulting session object in the remote_session field. The fetch_image(), method is then called once per image, reusing the remote_session object.

from CharmFeatures import ProcessImages as pi
from MyRemoteAPI import FooSession # Note this doesn't actually exist
import numpy as np
# inherit from RemoteS3Config
class RemoteFooConfig (pi.RemoteS3Config):
    # The super().__init__() method just copies the provided configuration dict
    # to the class remote_config field. No need to override it normally.
    # The remote_config dict must be picklable, so it should just contain
    # simple strings to initiate the session.
    def init_worker (self):
        # initiate a session using values stored in the remote_config dict
        self.remote_session = FooSession (self.remote_config['foo_profile'])
        return (self.remote_session)
    def fetch_image (self, path):
        # The numpy matrix returned should be a 2D matrix of doubles
        img_mat = self.remote_session.fetch_blob_as_numpy (path, np_type = np.double)
        return (img_mat)
# initialize the parameters necessary to establish a remote session
rc = RemoteFooConfig(remote_config = {'foo_profile' : 'myProfileName'})
ip = pi.ProcessImages(normalize=True, tile=(4,4), image_links=image_links, remote_config=rc)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charmfeatures-1.3.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (7.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file charmfeatures-1.3.1.tar.gz.

File metadata

  • Download URL: charmfeatures-1.3.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for charmfeatures-1.3.1.tar.gz
Algorithm Hash digest
SHA256 a176bd75fb2a57da5e4af301da5ce5fb6209c7a6f644d79bac0652d8f8c7d99b
MD5 faee5469d829e55848b89f52fad1fed2
BLAKE2b-256 cf5d61222caa166c0f2d74986b7ccd2039b1e894720651600e084795e0bf5c2e

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aa91baf62a637aaf38074d8dde7c96cce013985ff8728a9db8f201474750e862
MD5 ebf82b1ced8165fd6499909a2baaafc8
BLAKE2b-256 8e06279542062c775c5e0214f2a8629d9c1d143f147534efdbd6b82410cd8efb

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cdaa7fb7bf85b527181384ce7148c399b771e9ab53c8e06a61c2b2db22037110
MD5 c720b2db3878076bb0aa58f172a62db6
BLAKE2b-256 ebf33620e15fdd4f15ae09fc533332a9dcf33b7bf06331ee9096da6e7013ec4a

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e6ce97e7c25c5c311ec327ed5aec596495467ee973a185bea0b62847755ec889
MD5 8b71f7307ceec576cdacf7b2911cb9e1
BLAKE2b-256 4b4358b8098157abed4e97f8635f39e17781dc9fccb5c7db118fe58012f640fb

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7f24ee3490b24224abc32f5f04ba79b26af319004fcb38c0e95e3e474f5eee5e
MD5 36fc7ec9fea2962d5df63eaa70f9cfd1
BLAKE2b-256 c43fd3b670202e5b24748fa4fc7542d6bf1a068e4033d9f58a2f2a31d46a7774

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 51abbb83b71262e7b375b141cf104bacbcbe5ac54301cdc3b2a86635bccffda0
MD5 0893b6e40be0e0c8dd89aa10089ff4ba
BLAKE2b-256 5781a1b574967ac97466e807bbb3da04dd6e167375a1009fb782303b9b0cb0d1

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 97feec436c4025a01d7eb44edff6e2c0cce828affe12d33ab47e71c4a897e7d2
MD5 2db6268f4b0256cd68aca6901c41136b
BLAKE2b-256 eaff6f95d44ed8d934ec0091a76fc3ec44036349ccbe1afc539d8c1fc52823ea

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e5e0b7baba841f6c84e110174528831e4d10b92358e2de18908990351a167544
MD5 42382ec45cd3d7d66a795dc6f24e340e
BLAKE2b-256 4dea6b177dc00a6fe4ff15badecf08338743a1f23746231802f13a02494a576c

See more details on using hashes here.

File details

Details for the file charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for charmfeatures-1.3.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9a3b50f9bdb6c4cc11b9590e64a5abefc2d12c63db285be0e9a60842b0b4281c
MD5 af3e61e55a0ff5f6a2d3c443932c3ec4
BLAKE2b-256 a6c311e1a8bb77a088a1f6b9563d6db5f8d7657d31ab1a1242213a997efa2ffe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page