Skip to main content

Adaptive PCA with parallel scaling and dimensionality reduction

Project description

AdaptivePCA

AdaptivePCA is a Python package designed for high-performance Principal Component Analysis (PCA) with an adaptive component selection approach. It allows the user to perform dimensionality reduction on large datasets efficiently, choosing the optimal number of PCA components based on a specified explained variance threshold. It supports both StandardScaler and MinMaxScaler for data preprocessing and can operate in both parallel and non-parallel modes.

Features

  • Automatic Component Selection: Automatically chooses the number of components needed to reach a specified variance threshold.
  • Scaler Options: Supports StandardScaler and MinMaxScaler for data scaling.
  • Parallel Processing: Uses parallel processing to speed up computations, particularly beneficial for large datasets.
  • Easy Integration: Designed to integrate seamlessly with other data science workflows.

Installation

Clone this repository and install the package using pip:

git clone https://github.com/yourusername/adaptivepca.git
cd adaptivepca
pip install .

Usage

import pandas as pd
from adaptivepca import AdaptivePCA

# Load your data (example)
data = pd.read_csv("your_dataset.csv")
X = data.drop(columns=['Label'])  # Features
y = data['Label']  # Target variable

# Initialize and fit AdaptivePCA
adaptive_pca = AdaptivePCA(variance_threshold=0.95, max_components=10)
X_reduced = adaptive_pca.fit_transform(X)

# Results
print("Optimal Components:", adaptive_pca.best_n_components)
print("Explained Variance:", adaptive_pca.best_explained_variance)

Parameters

  • variance_threshold: float, default=0.95
    The cumulative variance explained threshold to determine the optimal number of components.

  • max_components: int, default=10
    The maximum number of components to consider.

Methods

  • fit(X): Fits the AdaptivePCA model to the data X.
  • transform(X): Transforms the data X using the fitted PCA model.
  • fit_transform(X): Fits and transforms the data in one step.

Example

Below is an example usage of AdaptivePCA in parallel mode:

adaptive_pca = AdaptivePCA(variance_threshold=0.95, max_components=10)
X_reduced = adaptive_pca.fit_transform(X)

print(f"Optimal scaler: {adaptive_pca.best_scaler}")
print(f"Number of components: {adaptive_pca.best_n_components}")
print(f"Explained variance: {adaptive_pca.best_explained_variance}")

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request to discuss your changes.

Acknowledgments

This project makes use of the scikit-learn, numpy, and pandas libraries for data processing and machine learning.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptivepca-1.0.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adaptivepca-1.0.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file adaptivepca-1.0.1.tar.gz.

File metadata

  • Download URL: adaptivepca-1.0.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for adaptivepca-1.0.1.tar.gz
Algorithm Hash digest
SHA256 271c3f10ce157dcfaf8abc35038d04c1b535211236fdc30ceef7b9e76440e083
MD5 6fed20ae089068c2328a653e6868dba5
BLAKE2b-256 1ce4327b79b3be0bf1e7fbc1e861b9ea346fb7f569b95ed33ed53d574be9f148

See more details on using hashes here.

File details

Details for the file adaptivepca-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: adaptivepca-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for adaptivepca-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 72e60cf01702d686ff9ca5f6125ad0da4ce93e14b3318e73248e746ee87a3221
MD5 7fe788274e9e66e19d49f344ba1e2c2a
BLAKE2b-256 f9164e24aaff007311cbc14126e7e343ab9af277331a67b7322dcb6aea0ab6ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page