A package for applying, comparing, and visualizing dimensionality reduction methods across various target dimensions.

These details have not been verified by PyPI

Project links

Homepage

Project description

DimensionalityReductionMethods

Dimensionality reduction is an essential aspect of data analysis and machine learning. It allows for the transformation of high-dimensional data into more manageable, interpretable forms while also preserving the core structure of the data. This package aims to simplify the application of various dimensionality reduction techniques, including methods like PCA, t-SNE, and UMAP, to datasets across a wide range of target dimensions.

With this package, users can:

Perform dimensionality reduction using multiple methods with minimal setup and compare methods using side-by-side evaluations.
Analyze results quantitatively, measuring trustworthiness (how well local relationships are preserved during reduction) and reconstruction error (the discrepancy between original and reconstructed data) to assess the performance of methods and determine the intrinsic dimensionality of the data.
Visualize lower dimensional projections for further insights into the data structure and relationships.

By combining automation, visualization, and flexibility, this package simplifies the exploration of high-dimensional datasets and guides users in choosing the best methods for their applications.

Installation

DimensionalityReductionMethods can be easily installed via pip from the PyPI repository. The below command will install the DimensionalityReductionMethods package and its dependencies.

pip install dimensionalityreductionmethods

Getting Started

Below is a step-by-step guide on how to use the package.

1. Import the package.

import dimensionalityreductionmethods as drm

2. Intialize the `DimensionalityReductionHandler` with your dataset. Ensure the dataset is a Numpy array.

drh = drm.DimensionalityReductionHandler(data)

3. Provide a list of dimensionality reduction methods to apply to the data.

The supported methods are: PCA, KPCA, Isomap, UMAP, t-SNE, Autoencoder, LLE.

drh.analyze_dimensionality_reduction(
    [
        "isomap",
        "PCA",
        "tSNE",
        "Umap",
        "kpca",
        "autoencoder",
        "lle",
    ]
)

This method applies the specified dimensionality reduction techniques to the dataset. The data is reduced to 1 to n dimensions, where n is the original dimensionality of the dataset. It computes performance metrics such as trustworthiness and reconstruction error for each method (if applicable), helping to evaluate how well each method preserves the data's structure in lower dimensions.

4. Plot the results of the dimensionality reduction methods.

The plot summarizes the performance of each method, such as trustworthiness and reconstruction error across dimensions.

drh.plot_results()

5. Display a summary table of the results.

The table shows the optimal components, maximum trustworthiness, minimum reconstruction error, and computation time for each dimensionality reduction method.

drh.table()

6. Visualize low-dimensional projections of the data.

By default, the data is visualized in 2D. Set plot_in_3d=True to generate a 3D visualization.

drh.visualization()
drh.visualization(plot_in_3d=True)

All steps together:

import dimensionalityreductionmethods as drm

# Initialize the handler with your data
drh = drm.DimensionalityReductionHandler(data)

# Analyze dimensionality reduction using selected methods
drh.analyze_dimensionality_reduction(
    [
        "isomap",
        "PCA",
        "tSNE",
        "Umap",
        "kpca",
        "autoencoder",
        "lle",
    ]
)

# Visualize and summarize the results
drh.plot_results()
drh.table()
drh.visualization()
drh.visualization(plot_in_3d=True)

The examples folder includes sample notebooks featuring toy datasets that serve as helpful references.

Methods Overview

This section outlines key dimensionality reduction techniques, highlighting their functionality, benefits, and limitations to help users choose the best method for their data.

PCA : Principal Component Analysis

This method aims to preserve the maximum variance of the high dimensional dataset while reduces the number of components to the smallest possible.

Pros	Cons
Simple method	Assumes linearity
Works well for linear data	Sensitive to scaled data
Computationally efficient	Sensitive to noisy data
Easy to interpret	Cannot capture nonlinear relationships

KPCA : Kernel Principal Component Analysis

Maximizes the variance between the high dimensional data within a nonlinear feature space, effectively capturing dataset's complex/nonlinear relationships using kernel functions to minimize the principal components. The principal components can represent the dataset's number of dimensions.

Pros	Cons
Nonlinear method	Kernel choice can be tricky
Flexible with different kernels	Higher computational cost
Works well for complex datasets	Sensitive to parameters like kernel width
Computationally expensive	May not perform well

LLE : Locally Linear Embedding

The main idea is to use k-neighbors of each point included in the dataset and tranform it as a combination of them. It computes the weights that best reconstruct each vector from its neighbors and then generates low-dimensional representations that can be reconstructed using these weights. In general, preserves local relationships by minimizing reconstruction error of each point.

Pros	Cons
Effective for manifold learning	Sensitive to noise and number of neighbors
Captures intrinsic geometry	Computationally expensive for large datasets
Preserves local structures

t-SNE : t-distributed Stochastic Neighbor Embedding

Models the distribution of each point's closest neighbors -perplexity- and maps them onto a lower-dimensional space while maintaining local relationships. This allows dimensionality reduction while simultaniously clustering the data. The clusters present the relationship between the high dimensional data, visualized in the dimensionality reduced space.

Pros	Cons
Excels at visualizing high-dimensional data	Computationally intensive
Captures local clusters well	Hard to interpret quantitatively
Widely adopted in exploratory analysis	Does not preserve global structure

ISOMAP : Isometric Feature Mapping

Preserves the geodesic stucture between all data points of the high dimensional dataset while maximizing the variance. The lower dimensions, represented as principal components, summarize the intrinsic structure of the high dimensional data.

Pros	Cons
Good for manifold learning	Sensitive to noise
Preserves global structures	Requires good connectivity of the graph
Effective for datasets with intrinsic geometry	Computationally expensive for large datasets

UMAP : Uniform Manifold Approximation and Projection

Minimizes a cross-entropy loss between high-dimensional and low-dimensional fuzzy topological structures.

Pros	Cons
Fast and scalable	Hyperparameters tuning can affect results
Preserves both local and global structures	Interpretation is not as straightforward as PCA
Works well with noisy data
Versatile for visualization and clustering

Autoencoder

Autoencoders belong to the N.N. category and their structure helps the method to reduce the number of variables in a dataset. An autoecoder structure consists of an input layer, an output layer and various hidden layers, with the structure depending on the desired complexity for the N.N. In order to achieve dimensionality reduction, the dataset enters the input layer, passes through the hidden layers and reaches a bottleneck which reducts the dimension of the dataset. Then the decoders -the layers after the botttleneck- reconstruct the high dimensional dataset we previously entered as input.

Pros	Cons
Handles nonlinear data	Requires careful architecture tuning
Can be customized for different tasks	Training can be computationally expensive
Scalable to large datasets	Prone to overfitting

Dimensionality Reduction Performance

In order to determine the optimal method and the appropriate number of dimensions, we must evaluate the reconstruction error and the trustworthiness of each method -if they exist-. We consider one method as optimal if the trustworthiness is close to 100% and/or its reconstruction error is low. However, not all methods provide both metrics. The table below outlines which metrics are available for each method.

Method	Trustworthiness	Reconstruction Error
PCA	No	Yes
KPCA	No	Yes
LLE	Yes	Yes
ISOMAP	Yes	Yes
UMAP	Yes	No
tSNE	Yes	No
AUTOENCODER	Yes	Yes

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.3

Dec 20, 2024

This version

0.1.2

Dec 5, 2024

0.1.1

Dec 5, 2024

0.1

Dec 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dimensionalityreductionmethods-0.1.2.tar.gz (25.9 kB view details)

Uploaded Dec 5, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dimensionalityreductionmethods-0.1.2-py3-none-any.whl (24.1 kB view details)

Uploaded Dec 5, 2024 Python 3

File details

Details for the file dimensionalityreductionmethods-0.1.2.tar.gz.

File metadata

Download URL: dimensionalityreductionmethods-0.1.2.tar.gz
Upload date: Dec 5, 2024
Size: 25.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.6

File hashes

Hashes for dimensionalityreductionmethods-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`6e02a75f95b699d54968e6541d165579a4da526967c7189915a9cfd337e661b9`
MD5	`2193d050139d74dcb302ca5455963716`
BLAKE2b-256	`f03f4b54c66cc0aa62a9c6117d15deac9f4bf5f1f25ff029a173609fe4ed69f8`

See more details on using hashes here.

File details

Details for the file dimensionalityreductionmethods-0.1.2-py3-none-any.whl.

File metadata

Download URL: dimensionalityreductionmethods-0.1.2-py3-none-any.whl
Upload date: Dec 5, 2024
Size: 24.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.6

File hashes

Hashes for dimensionalityreductionmethods-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`735e57968e4beb3e4f976f87639ea4266613833a0a31802e34d40ffbe857307c`
MD5	`a530ea082e340c09b20d091013de82e0`
BLAKE2b-256	`6848f68e05517d7c640e193205e3c1621a71387eb68c090de53ea7d770221046`

See more details on using hashes here.

dimensionalityreductionmethods 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DimensionalityReductionMethods

Installation

Getting Started

1. Import the package.

2. Intialize the DimensionalityReductionHandler with your dataset. Ensure the dataset is a Numpy array.

3. Provide a list of dimensionality reduction methods to apply to the data.

4. Plot the results of the dimensionality reduction methods.

5. Display a summary table of the results.

6. Visualize low-dimensional projections of the data.

All steps together:

Methods Overview

PCA : Principal Component Analysis

KPCA : Kernel Principal Component Analysis

LLE : Locally Linear Embedding

t-SNE : t-distributed Stochastic Neighbor Embedding

ISOMAP : Isometric Feature Mapping

UMAP : Uniform Manifold Approximation and Projection

Autoencoder

Dimensionality Reduction Performance

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. Intialize the `DimensionalityReductionHandler` with your dataset. Ensure the dataset is a Numpy array.