Skip to main content

clustermil - clustering based multiple instance learning

Project description

clustermil

Build Status GitHub issues

Python package for multiple instance learning (MIL) for large n_instance dataset.

Features

  • support count-based multiple instance assumptions (see wikipedia)
  • support multi-class setting
  • support scikit-learn Clustering algorithms (such as MiniBatchKMeans)
  • fast even if n_instance is large

Installation

pip install clustermil

Usage

# Prepare follwing dataset
#
# - bags ... list of np.ndarray
#            (num_instance_in_the_bag * num_features)
# - lower_threshold ... np.ndarray (num_bags * num_classes)
# - upper_threshold ... np.ndarray (num_bags * num_classes)
#
# bags[i_bag] contains not less than lower_thrshold[i_bag, i_class]
# i_class instances.

# Prepare single-instance clustering algorithms
from sklearn.cluster import MiniBatchKMeans
n_clusters = 100
clustering = MiniBatchKMeans(n_clusters=n_clusters)
clusters = clustering.fit_predict(np.vstack(bags)) # flatten bags into instances

# Prepare one-hot encoder
from sklearn.preprocessing import OneHotEncoder
onehot_encoder = OneHotEncoder()
onehot_encoder.fit(clusters)

# generate ClusterMilClassifier with helper function
from clustermil import generate_mil_classifier

milclassifier = generate_mil_classifier(
            clustering,
            onehot_encoder,
            bags,
            lower_threshold,
            upper_threshold,
            n_clusters)

# after multiple instance learning,
# you can predict instance class
milclassifier.predict([instance_feature])

See tests/test_classification.py for an example of a fully working test data generation process.

License

clustermil is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustermil-0.2.0.tar.gz (4.3 kB view hashes)

Uploaded Source

Built Distribution

clustermil-0.2.0-py3-none-any.whl (4.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page