Skip to main content

Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.

Project description

Features Maximization Metric

ci documentation pypi version DOI

Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.

Quick description

Features Maximization (FMC) is a features selection method described in Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7.

This metric is computed by applying the following steps:

  1. Compute the Features F-Measure metric (based on Features Recall and Features Predominance metrics).

    (a) The Features Recall FR[f][c] for a given class c and a given feature f is the ratio between the sum of the vectors weights of the feature f for data in class c and the sum of all vectors weights of feature f for all data. It answers the question: "Can the feature f distinguish the class c from other classes c' ?"

    (b) The Features Predominance FP[f][c] for a given class c and a given feature f is the ratio between the sum of the vectors weights of the feature f for data in class c and the sum of all vectors weights of all feature f' for data in class c. It answers the question: "Can the feature f better identify the class c than the other features f' ?"

    (c) The Features F-Measure FM[f][c] for a given class c and a given feature f is the harmonic mean of the Features Recall (a) and the Features Predominance (c). It answers the question: "How much information does the feature f contain about the class c ?"

  2. Compute the Features Selection (based on F-Measure Overall Average comparison).

    (d) The F-Measure Overall Average is the average of Features F-Measure (c) for all classes c and for all features f. It answers the question: "What are the mean of information contained by features in all classes ?"

    (e) A feature f is Selected if and only if it exist at least one class c for which the Features F-Measure (c) FM[f][c] is bigger than the F-Measure Overall Average (d). It answers the question: "What are the features which contain more information than the mean of information in the dataset ?"

    (f) A Feature f is Deleted if and only if the Features F-Measure (c) FM[f][c] is always lower than the F-Measure Overall Average (d) for each class c. It answers the question: "What are the features which do not contain more information than the mean of information in the dataset ?"

  3. Compute the Features Contrast and Features Activation (based on F-Measure Marginal Averages comparison).

    (g) The F-Measure Marginal Averages for a given feature f is the average of Features F-Measure (c) for all classes c and for the given feature f. It answers the question: "What are the mean of information contained by the feature f in all classes ?"

    (h) The Features Contrast FC[f][c] for a given class c and a given selected feature f is the ratio between the Features F-Measure (c) FM[f][c] and the F-Measure Marginal Averages (g) for selected feature f put to the power of an Amplification Factor. It answers the question: "How relevant is the feature f to distinguish the class c ?"

    (i) A selected Feature f is Active for a given class c if and only if the Features Contrast (h) FC[f][c] is bigger than 1.0. It answers the question : "For which classes a selected feature f is relevant ?"

This metric is an efficient method to:

  • identify relevant features of a dataset modelization;
  • describe association between vectors features and data classes;
  • increase contrast between data classes.

Documentation

Installation

Features Maximization Metric requires Python 3.8 or above.

To install with pip:

# install package
python3 -m pip install cognitivefactory-features-maximization-metric

To install with pipx:

# install pipx
python3 -m pip install --user pipx

# install package
pipx install --python python3 cognitivefactory-features-maximization-metric

Development

To work on this project or contribute to it, please read:

References

  • Features Maximization Metric: Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7
  • V-Measure: Rosenberg, Andrew & Hirschberg, Julia. (2007). V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 410-420.

How to cite

Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page