XMPro Dimensionality is a Python library for Dimensionality reduction.
Project description
XMdim
XMdim is a Python library designed for performing dimensionality reduction on embedding data, with a primary focus on Principal Component Analysis (PCA). It provides a flexible and extensible framework for reducing data dimensions, analyzing variance, and reconstructing data using PCA.
Features
- PCA Transformation: Perform PCA on your embedding data with customizable number of components.
- Flexible Scaling: Option to apply standard scaling or min-max scaling before PCA.
- Variance Analysis: Calculate and retrieve explained variance ratios and cumulative explained variance.
- Component Loadings: Access the loadings (principal components) of the PCA.
- Data Reconstruction: Inverse transform PCA results to reconstruct original data.
- Reconstruction Error: Calculate the mean squared error between original and reconstructed data.
- Optimal Components: Find the optimal number of components for a given variance threshold.
- New Data Projection: Project new data onto the existing PCA space.
Installation
Install XMdim using pip:
pip install xmdim
Usage
Here's a basic example of how to use XMdim:
from xmdim import PCAAnalyzer, ScalingType
# Sample embeddings
embeddings = {
'key1': [[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10], [10, 11, 12, 13]],
'key2': [[2, 3, 4, 5], [5, 6, 7, 8], [8, 9, 10, 11], [11, 12, 13, 14]]
}
# Create a PCAAnalyzer instance
analyzer = PCAAnalyzer(embeddings)
# Perform PCA
transformed_data = analyzer.perform_pca(n_components=2, scaling=ScalingType.STANDARD)
# Get explained variance ratio
explained_variance = analyzer.get_explained_variance_ratio()
# Get cumulative explained variance
cumulative_variance = analyzer.get_cumulative_explained_variance()
print("Transformed Data:", transformed_data)
print("Explained Variance Ratio:", explained_variance)
print("Cumulative Explained Variance:", cumulative_variance)
Advanced Usage
Loadings and Data Reconstruction
# Get loadings
loadings = analyzer.get_loadings()
# Inverse transform
reconstructed_data = analyzer.inverse_transform()
# Get reconstruction error
error = analyzer.get_reconstruction_error()
print("Loadings:", loadings)
print("Reconstructed Data:", reconstructed_data)
print("Reconstruction Error:", error)
Optimal Components and New Data Projection
# Get optimal number of components
optimal_components = analyzer.get_optimal_components(variance_threshold=0.95)
# Project new data
new_data = {
'key1': [[2, 3, 4, 5], [5, 6, 7, 8]],
'key2': [[3, 4, 5, 6], [6, 7, 8, 9]]
}
projected_data = analyzer.project_new_data(new_data)
print("Optimal Number of Components:", optimal_components)
print("Projected New Data:", projected_data)
Dependencies
- numpy
- scikit-learn
Contributing
We welcome contributions! Please see our contributing guidelines for more details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
For any queries or support, please contact [your contact information].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xmdim-0.0.1.tar.gz.
File metadata
- Download URL: xmdim-0.0.1.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5adf800f47c99ad4678adbb52b13e1024018da1fc98524dba82d089c9ec8519b
|
|
| MD5 |
ac02b323a0b0f56e85528c25fbd07fb2
|
|
| BLAKE2b-256 |
df0cc79c0a2517d9f3c347a2fa4c55443e725b1bb07564c8074a1e3d502aae15
|
File details
Details for the file xmdim-0.0.1-py3-none-any.whl.
File metadata
- Download URL: xmdim-0.0.1-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc79497763f299489ee919a15840421aef5f25f3202af4dcd66c3dc2fa08f390
|
|
| MD5 |
b5168bec620c7cc353b3f78128cfb59a
|
|
| BLAKE2b-256 |
cb4d4857322356f0efb0fa9faaa441b6a351fa8f3488739fda87f95f96c6084e
|