Skip to main content

Multiple instance learning via embedded instance selection

Project description

Tests coverage

Multiple instance learning via embedded instance selection

This python package is an implementation of MILES: Multiple-instance learning via embedded instance selection from IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 12, DECEMBER 2006.

The paper describes a method to encode bag-space features into a space defined by the most-likely-cause-estimator of the bag and training feature space.

The most likely cause estimator is defined as Most Likely Estimator

An example encoding

Look at embedding_test.py for an example embedding of dummy data. Dummy data is created from 5 normal distributions, and each instance is generated by one of the following two-dimensional probability distributions:

N1([5,5]^T, I), -> The normal distribution with mean [5,5] and 1 unit standard deviation N2([5,-5]^T, I), N3([-5,5]^T, I), N4([-5,-5]^T, I), N5([0,0]^T, I)

Bags are created from a variable number of instances per bag, and this example uses 8. A bag is labeled positive if it contains instances from at least two different distributions among N1, N2, and N3. Otherwise the bag is negative. This image displays the raw 2-dimensional data #2-D Raw Data

A single bag is of shape (N_INSTANCES, FEATURE_SPACE) where n is the number of instances in a bag, and p is the feature space of the instances.

All positive bags are of shape (N_POSITIVE_BAGS, N_INSTANCES, FEATURE_SPACE) where N_POSITIVE_BAGS is the number of positive bags. Negative bags are of shape (N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE). The total set of training instances is of shape (N_POSITIVE_BAGS + N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE).

A single bag is embedded into a vector of shape ((N_POSITIVE_BAGS + N_NEGATIVE_BAGS) * N_INSTANCES), which is the total number of instances from all positive and negative bags.

In this example let When projecting the training instances onto the vectors

# Feature vectors close to mean of `true` positive distributions
x1 = np.array([4.3, 5.2])
x2 = np.array([5.4, -3.9])
x3 = np.array([-6.0, 4.8])

the result is a (3,40) matrix which is visualized below. #Linearly Separable Bags

Testing

  • python -m unittest tests.embedding_test
  • python -m unittest tests.l1_svm_test

Code coverage and linting

  • pylint -r n src/tests/ src/pyMILES
  • From src directory: coverage run -m unittest tests.embedding_test
  • autopep8 --recursive --in-place src/tests/ src/pyMILES/

Building

Increment build version in setup.cfg python -m build . python -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyMILES-0.0.6.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

pyMILES-0.0.6-py3-none-any.whl (12.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page