Skip to main content

Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.

Project description

Features Maximization Metric

ci documentation pypi version DOI

Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.

Quick description

Features Maximization (FMC) is a features selection method described in Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7.

This metric is computed by applying the following steps:

  1. Compute the Features F-Measure metric (based on Features Recall and Features Predominance metrics).

    (a) The Features Recall FR[f][c] for a given class c and a given feature f is the ratio between the sum of the vectors weights of the feature f for data in class c and the sum of all vectors weights of feature f for all data. It answers the question: "Can the feature f distinguish the class c from other classes c' ?"

    (b) The Features Predominance FP[f][c] for a given class c and a given feature f is the ratio between the sum of the vectors weights of the feature f for data in class c and the sum of all vectors weights of all feature f' for data in class c. It answers the question: "Can the feature f better identify the class c than the other features f' ?"

    (c) The Features F-Measure FM[f][c] for a given class c and a given feature f is the harmonic mean of the Features Recall (a) and the Features Predominance (c). It answers the question: "How much information does the feature f contain about the class c ?"

  2. Compute the Features Selection (based on F-Measure Overall Average comparison).

    (d) The F-Measure Overall Average is the average of Features F-Measure (c) for all classes c and for all features f. It answers the question: "What are the mean of information contained by features in all classes ?"

    (e) A feature f is Selected if and only if it exist at least one class c for which the Features F-Measure (c) FM[f][c] is bigger than the F-Measure Overall Average (d). It answers the question: "What are the features which contain more information than the mean of information in the dataset ?"

    (f) A Feature f is Deleted if and only if the Features F-Measure (c) FM[f][c] is always lower than the F-Measure Overall Average (d) for each class c. It answers the question: "What are the features which do not contain more information than the mean of information in the dataset ?"

  3. Compute the Features Contrast and Features Activation (based on F-Measure Marginal Averages comparison).

    (g) The F-Measure Marginal Averages for a given feature f is the average of Features F-Measure (c) for all classes c and for the given feature f. It answers the question: "What are the mean of information contained by the feature f in all classes ?"

    (h) The Features Contrast FC[f][c] for a given class c and a given selected feature f is the ratio between the Features F-Measure (c) FM[f][c] and the F-Measure Marginal Averages (g) for selected feature f put to the power of an Amplification Factor. It answers the question: "How relevant is the feature f to distinguish the class c ?"

    (i) A selected Feature f is Active for a given class c if and only if the Features Contrast (h) FC[f][c] is bigger than 1.0. It answers the question : "For which classes a selected feature f is relevant ?"

This metric is an efficient method to:

  • identify relevant features of a dataset modelization;
  • describe association between vectors features and data classes;
  • increase contrast between data classes.

Documentation

Installation

Features Maximization Metric requires Python 3.8 or above.

To install with pip:

# install package
python3 -m pip install cognitivefactory-features-maximization-metric

To install with pipx:

# install pipx
python3 -m pip install --user pipx

# install package
pipx install --python python3 cognitivefactory-features-maximization-metric

Development

To work on this project or contribute to it, please read:

References

  • Features Maximization Metric: Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7
  • V-Measure: Rosenberg, Andrew & Hirschberg, Julia. (2007). V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 410-420.

How to cite

Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file cognitivefactory-features-maximization-metric-1.0.0.tar.gz.

File metadata

File hashes

Hashes for cognitivefactory-features-maximization-metric-1.0.0.tar.gz
Algorithm Hash digest
SHA256 709d2be4346d9fbb9149bee4451ce0be96184f58dbbdca90e2ba0c735d6e9eb8
MD5 302cd3dc2a9d43c91fdfe445c9d1a39a
BLAKE2b-256 d5e3973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc

See more details on using hashes here.

File details

Details for the file cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5021cf4021fb99eaa63f74d6ffdd8771b30806087dd4c7a45528970700b1396d
MD5 3672757aea4c8aee2a5a5e63fa4435ce
BLAKE2b-256 a818ae3b161416d7b47392dd3b9bd348fdb253421bfa788b1eed5305d178eece

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page