A Python library for feature selection in tabular datasets

These details have not been verified by PyPI

Project links

Project description

dataclr

dataclr is a Python library for feature selection, designed to help machine learning engineers and data scientists quickly identify the best features from tabular datasets. By combining a wide range of filter, wrapper, and embedded methods, dataclr provides a robust and versatile approach to improve model performance and streamline feature engineering.

Features

Comprehensive Methods:

Filter Methods: Statistical and data-driven approaches like ANOVA, MutualInformation, and VarianceThreshold.

Method	Regression	Classification
`ANOVA`	Yes	Yes
`Chi2`	No	Yes
`CumulativeDistributionFunction`	Yes	Yes
`CohensD`	No	Yes
`CramersV`	No	Yes
`DistanceCorrelation`	Yes	Yes
`Entropy`	Yes	Yes
`KendallCorrelation`	Yes	Yes
`Kurtosis`	Yes	Yes
`LinearCorrelation`	Yes	Yes
`MaximalInformationCoefficient`	Yes	Yes
`MeanAbsoluteDeviation`	Yes	Yes
`mRMR`	Yes	Yes
`MutualInformation`	Yes	Yes
`Skewness`	Yes	Yes
`SpearmanCorrelation`	Yes	Yes
`VarianceThreshold`	Yes	Yes
`VarianceInflationFactor`	Yes	Yes
`ZScore`	Yes	Yes

Wrapper Methods: Model-based iterative methods like BorutaMethod, ShapMethod, and OptunaMethod.

Method Regression Classification

BorutaMethod Yes Yes

HyperoptMethod Yes Yes

OptunaMethod Yes Yes

ShapMethod Yes Yes

Flexible and Scalable:
- Supports both regression and classification tasks.
- Handles high-dimensional datasets efficiently.
Interpretable Results:
- Provides ranked feature lists with detailed importance scores.
- Supports visualization and reporting.
Seamless Integration:
- Works with popular Python libraries like pandas, scikit-learn, and statsmodels.

Method	Regression	Classification
`BorutaMethod`	Yes	Yes
`HyperoptMethod`	Yes	Yes
`OptunaMethod`	Yes	Yes
`ShapMethod`	Yes	Yes

Installation

Install dataclr using pip:

pip install dataclr

Getting Started

1. Load Your Dataset

Prepare your dataset as pandas DataFrames or Series and preprocess it (e.g., encode categorical features and normalize numerical values):

import pandas as pd
from sklearn.preprocessing import StandardScaler

# Example dataset
X = pd.DataFrame({...})  # Replace with your feature matrix
y = pd.Series([...])     # Replace with your target variable

# Preprocessing
X_encoded = pd.get_dummies(X)  # Encode categorical features
scaler = StandardScaler()
X_normalized = pd.DataFrame(scaler.fit_transform(X_encoded), columns=X_encoded.columns)

2. Use `FeatureSelector`

The FeatureSelector is a high-level API that combines multiple methods to select the best feature subsets:

from dataclr.feature_selection import FeatureSelector

# Initialize the FeatureSelector
selector = FeatureSelector(
    model=my_model,  # Replace with your model
    metric="accuracy",
    X_train=X_train,
    X_test=X_test,
    y_train=y_train,
    y_test=y_test,
)

# Perform feature selection
selected_features = selector.select_features(n_results=5)
print(selected_features)

3. Use Singular Methods

For granular control, you can use individual feature selection methods:

from dataclr.methods import MutualInformation

# Initialize a method
method = MutualInformation(model=my_model, metric="accuracy")

# Fit and transform
results = method.fit_transform(X_train, X_test, y_train, y_test)
print(results)

Documentation

Explore the full documentation for detailed usage instructions, API references, and examples.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Mar 5, 2025

0.2.0

Jan 6, 2025

0.1.3

Jan 3, 2025

This version

0.1.0

Jan 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataclr-0.1.0.tar.gz (33.3 kB view details)

Uploaded Jan 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataclr-0.1.0-py3-none-any.whl (53.0 kB view details)

Uploaded Jan 2, 2025 Python 3

File details

Details for the file dataclr-0.1.0.tar.gz.

File metadata

Download URL: dataclr-0.1.0.tar.gz
Upload date: Jan 2, 2025
Size: 33.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for dataclr-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`29834dc1a124893f81239e51e00ceb9a193957984c94c4d36b6a1c9af4e00947`
MD5	`276dd47042b2a2da6e17b6f1a65a30d6`
BLAKE2b-256	`3e022fbb00d63b53db11aa2e5fd3c9e5217e1d7201bf1038f07aeddd53bab07a`

See more details on using hashes here.

File details

Details for the file dataclr-0.1.0-py3-none-any.whl.

File metadata

Download URL: dataclr-0.1.0-py3-none-any.whl
Upload date: Jan 2, 2025
Size: 53.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for dataclr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85b326c8cc348d0f74214568ee75309b2dc2934ed3007a3e312ae668f9972130`
MD5	`ce94014cc44307e33837b6294c2a7613`
BLAKE2b-256	`ff1b721049c1dc95d735607247f1d5f1cff07abf394dde63370ed74162e87aa2`

See more details on using hashes here.

dataclr 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dataclr

Features

Installation

Getting Started

1. Load Your Dataset

2. Use `FeatureSelector`

3. Use Singular Methods

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

dataclr 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dataclr

Features

Installation

Getting Started

1. Load Your Dataset

2. Use FeatureSelector

3. Use Singular Methods

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. Use `FeatureSelector`