Adaptive PCA with parallel scaling and dimensionality reduction
Project description
AdaptivePCA AdaptivePCA is a flexible, scalable Python package that enables dimensionality reduction with PCA, automatically selecting the best scaler and the optimal number of components to meet a specified variance threshold. Built for efficiency, AdaptivePCA includes parallel processing capabilities to speed up large-scale data transformations, making it ideal for data scientists and machine learning practitioners working with high-dimensional datasets.
Features Automatic Component Selection: Automatically selects the optimal number of principal components based on a specified variance threshold. Scaler Selection: Compares multiple scalers (StandardScaler and MinMaxScaler) to find the best fit for the data. Parallel Processing: Option to use concurrent scaling for faster computations. Easy Integration: Built on top of widely-used libraries like scikit-learn and numpy. Installation You can install AdaptivePCA via pip:
bash Copy code pip install adaptivepca Usage Import and Initialize python Copy code from adaptivepca import AdaptivePCA import pandas as pd
Load your dataset
X = pd.read_csv("your_data.csv") # Ensure your dataset is loaded as a Pandas DataFrame Basic Usage Initialize AdaptivePCA and fit it to your data:
python Copy code
Initialize AdaptivePCA with desired variance threshold and maximum components
adaptive_pca = AdaptivePCA(variance_threshold=0.95, max_components=10)
Fit and transform data
X_transformed = adaptive_pca.fit_transform(X) Parallel Processing For larger datasets, enable parallel processing to speed up computations:
python Copy code
Fit AdaptivePCA with parallel processing
adaptive_pca.fit(X, parallel=True) Accessing Best Parameters After fitting, you can retrieve the best scaler, number of components, and explained variance score:
python Copy code print(f"Best Scaler: {adaptive_pca.best_scaler}") print(f"Optimal Components: {adaptive_pca.best_n_components}") print(f"Explained Variance Score: {adaptive_pca.best_explained_variance}") Parameters variance_threshold (float): Desired variance threshold for component selection. Default is 0.95. max_components (int): Maximum number of PCA components to consider. Default is 10. Methods fit(X, parallel=False): Fits AdaptivePCA to the dataset X. Use parallel=True to enable parallel processing. transform(X): Transforms the dataset X using the previously fitted configuration. fit_transform(X): Combines fit and transform steps in one call. Example python Copy code from adaptivepca import AdaptivePCA import pandas as pd
Example dataset
X = pd.DataFrame({ 'feature1': [1, 2, 3, 4, 5], 'feature2': [10, 9, 8, 7, 6], 'feature3': [2, 4, 6, 8, 10] })
adaptive_pca = AdaptivePCA(variance_threshold=0.95, max_components=2) X_transformed = adaptive_pca.fit_transform(X)
Retrieve best configuration details
print(f"Best Scaler: {adaptive_pca.best_scaler}") print(f"Optimal Components: {adaptive_pca.best_n_components}") print(f"Explained Variance Score: {adaptive_pca.best_explained_variance}") Dependencies scikit-learn>=0.24 numpy>=1.19 pandas>=1.1 License This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for adaptivepca-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27aa9874b0298934e933f8d187f27bb03fc31bdc0ca3d958490b6c6ef675e55e |
|
MD5 | 70da688e7e6112ee07d196462ac31ab4 |
|
BLAKE2b-256 | 67cd315289b64e2f638ded3d2836f86dbb6af1b1cf12fd77c090b1c32e64b293 |