Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.
Project description
Features Maximization Metric
Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.
Quick description
Features Maximization (FMC
) is a features selection method described in Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7
.
This metric is computed by applying the following steps:
-
Compute the Features F-Measure metric (based on Features Recall and Features Predominance metrics).
(a) The Features Recall
FR[f][c]
for a given classc
and a given featuref
is the ratio between the sum of the vectors weights of the featuref
for data in classc
and the sum of all vectors weights of featuref
for all data. It answers the question: "Can the featuref
distinguish the classc
from other classesc'
?"(b) The Features Predominance
FP[f][c]
for a given classc
and a given featuref
is the ratio between the sum of the vectors weights of the featuref
for data in classc
and the sum of all vectors weights of all featuref'
for data in classc
. It answers the question: "Can the featuref
better identify the classc
than the other featuresf'
?"(c) The Features F-Measure
FM[f][c]
for a given classc
and a given featuref
is the harmonic mean of the Features Recall (a) and the Features Predominance (c). It answers the question: "How much information does the featuref
contain about the classc
?" -
Compute the Features Selection (based on F-Measure Overall Average comparison).
(d) The F-Measure Overall Average is the average of Features F-Measure (c) for all classes
c
and for all featuresf
. It answers the question: "What are the mean of information contained by features in all classes ?"(e) A feature
f
is Selected if and only if it exist at least one classc
for which the Features F-Measure (c)FM[f][c]
is bigger than the F-Measure Overall Average (d). It answers the question: "What are the features which contain more information than the mean of information in the dataset ?"(f) A Feature
f
is Deleted if and only if the Features F-Measure (c)FM[f][c]
is always lower than the F-Measure Overall Average (d) for each classc
. It answers the question: "What are the features which do not contain more information than the mean of information in the dataset ?" -
Compute the Features Contrast and Features Activation (based on F-Measure Marginal Averages comparison).
(g) The F-Measure Marginal Averages for a given feature
f
is the average of Features F-Measure (c) for all classesc
and for the given featuref
. It answers the question: "What are the mean of information contained by the featuref
in all classes ?"(h) The Features Contrast
FC[f][c]
for a given classc
and a given selected featuref
is the ratio between the Features F-Measure (c)FM[f][c]
and the F-Measure Marginal Averages (g) for selected feature f put to the power of an Amplification Factor. It answers the question: "How relevant is the featuref
to distinguish the classc
?"(i) A selected Feature
f
is Active for a given classc
if and only if the Features Contrast (h)FC[f][c]
is bigger than1.0
. It answers the question : "For which classes a selected featuref
is relevant ?"
This metric is an efficient method to:
- identify relevant features of a dataset modelization;
- describe association between vectors features and data classes;
- increase contrast between data classes.
Documentation
Installation
Features Maximization Metric requires Python
3.8 or above.
To install with pip
:
# install package
python3 -m pip install cognitivefactory-features-maximization-metric
To install with pipx
:
# install pipx
python3 -m pip install --user pipx
# install package
pipx install --python python3 cognitivefactory-features-maximization-metric
Development
To work on this project or contribute to it, please read:
- the Copier PDM template documentation ;
- the Contributing page for environment setup and development help ;
- the Code of Conduct page for contribution rules.
References
- Features Maximization Metric:
Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7
- V-Measure:
Rosenberg, Andrew & Hirschberg, Julia. (2007). V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 410-420.
How to cite
Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cognitivefactory-features-maximization-metric-1.0.0.tar.gz
.
File metadata
- Download URL: cognitivefactory-features-maximization-metric-1.0.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 709d2be4346d9fbb9149bee4451ce0be96184f58dbbdca90e2ba0c735d6e9eb8 |
|
MD5 | 302cd3dc2a9d43c91fdfe445c9d1a39a |
|
BLAKE2b-256 | d5e3973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc |
File details
Details for the file cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5021cf4021fb99eaa63f74d6ffdd8771b30806087dd4c7a45528970700b1396d |
|
MD5 | 3672757aea4c8aee2a5a5e63fa4435ce |
|
BLAKE2b-256 | a818ae3b161416d7b47392dd3b9bd348fdb253421bfa788b1eed5305d178eece |