Skip to main content

Automatically select the most relevant features based on correlation.

Project description

AutoCorrFeatureSelection

Automatically select the most relevant features based on correlation.

PyPI Latest Release PyPI Downloads

How it works

The AutoCorrFeatureSelection class utilizes correlation analysis to automatically select relevant features from a given dataset. Here's a step-by-step overview of how it works:

  1. Correlation Matrix:

The first step is to calculate the correlation matrix, which measures the pairwise correlation between all features in the dataset. The correlation matrix provides insight into the relationships between the features.

sepal.length sepal.width petal.length petal.width variety
sepal.length 1.0 -0.11 0.87 0.81 0.72
sepal.width -0.11 1.0 -0.42 -0.36 -0.42
petal.length 0.87 -0.42 1.0 0.96 0.94
petal.width 0.81 -0.36 0.96 1.0 0.95
variety 0.72 -0.42 0.94 0.95 1.0
  1. Threshold-based Selection:

Next, the class applies a threshold to the correlation matrix to identify columns with correlations above the specified threshold (for example 0.85). These columns are considered highly correlated and may contain redundant or similar information.

sepal.length sepal.width petal.length petal.width variety
sepal.length 0.87
sepal.width
petal.length 0.87 0.96 0.94
petal.width 0.96 0.95
variety 0.94 0.95
  1. Selected Columns and Relationships:

The selected columns are visually represented, showcasing the relationships between the highly correlated features. This diagram helps visualize the interconnectedness of these features.

iris_corr_diagram

By following these steps, the AutoCorrFeatureSelection class automates the process of feature selection based on correlation analysis, enabling you to identify and focus on the most informative and non-redundant features in your dataset.

Example

Examples can be found in examples/.

# set up auto correlation
auto_corr = AutoCorrFeatureSelection(df)

# select low correlated columns
selected_columns = auto_corr.select_columns_above_threshold(threshold=0.85)
filtered_df = df[selected_columns]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_corr_feature_selection-0.1.3.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_corr_feature_selection-0.1.3-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file auto_corr_feature_selection-0.1.3.tar.gz.

File metadata

  • Download URL: auto_corr_feature_selection-0.1.3.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.7 Darwin/22.5.0

File hashes

Hashes for auto_corr_feature_selection-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0bcc971700edfb104f1c27e881f44f4fcaa31722a35632e245e1075c2e237218
MD5 c45f96a017a9dc9e0c6e8ff9ee2d44e2
BLAKE2b-256 62ce3c38516d6c1d412edca49d25ebd44d07dd91243bdc9f37a27afce7e34a86

See more details on using hashes here.

File details

Details for the file auto_corr_feature_selection-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for auto_corr_feature_selection-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bbca06cdcdc05e275b00b34c4eac39e26b120240ce6ffe8acbbe94a6aac144e6
MD5 da974d1265254b7876cd660d335da4f8
BLAKE2b-256 df1f9ef3baf5025667342e630ad25c3985204ccc8b7af2b2fbef337f29990bd2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page