A Python library for autoencoder-based dimensionality reduction with a scikit-learn compatible interface.
Project description
ENCDR - Neural Component Dimensionality Reduction
A Python library for autoencoder-based dimensionality reduction with a scikit-learn compatible interface.
Features
- Scikit-learn Compatible: Implements
fit(),transform(),predict(), andfit_transform()methods - PyTorch Lightning Backend: Deep learning framework with automatic GPU support
- Configurable Architecture: Customizable encoder/decoder layers, activation functions, and training parameters
- Automatic Standardization: Optional feature scaling for improved training stability
- Validation Support: Built-in train/validation splits for monitoring training progress
- Multiple Activation Functions: Support for ReLU, Tanh, Sigmoid, LeakyReLU, ELU, and GELU
- Model Persistence: Save and load trained models with full state preservation
Installation
uv add encdr
# or
pip install encdr
Dependencies
- Python ≥ 3.12
- PyTorch Lightning ≥ 2.5.5
- scikit-learn ≥ 1.7.2
- torch ≥ 2.0.0
- numpy ≥ 1.21.0
Quick Start
from encdr import ENCDR
from sklearn.datasets import make_classification
import numpy as np
# Generate sample data
X, _ = make_classification(n_samples=1000, n_features=50, n_informative=30, random_state=42)
# Create and train autoencoder
encdr = ENCDR(
hidden_dims=[64, 32, 16], # Encoder layer sizes
latent_dim=8, # Bottleneck dimension
max_epochs=50, # Training epochs
random_state=42
)
# Fit and transform data
X_reduced = encdr.fit_transform(X)
print(f"Original shape: {X.shape}, Reduced shape: {X_reduced.shape}")
# Reconstruct original data
X_reconstructed = encdr.predict(X)
reconstruction_error = np.mean((X - X_reconstructed) ** 2)
print(f"Reconstruction MSE: {reconstruction_error:.4f}")
# Save model for later use
encdr.save("my_autoencoder.pkl")
# Load model (in a different session)
loaded_encdr = ENCDR.load("my_autoencoder.pkl")
X_reduced_loaded = loaded_encdr.transform(X[:10]) # Same results as original model
Advanced Usage
Custom Architecture
encdr = ENCDR(
hidden_dims=[128, 64, 32, 16], # Deep architecture
latent_dim=10, # 10D latent space
activation="tanh", # Tanh activation
dropout_rate=0.2, # 20% dropout for regularization
learning_rate=1e-3, # Learning rate
weight_decay=1e-4, # L2 regularization
batch_size=64, # Batch size
max_epochs=100, # Training epochs
validation_split=0.2, # 20% validation split
standardize=True, # Feature standardization
random_state=42, # Reproducibility
trainer_kwargs={"accelerator": "gpu", "devices": 1} # GPU training
)
Integration with Scikit-learn Pipelines
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
# Create pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('encdr', ENCDR(latent_dim=5, max_epochs=50, standardize=False))
])
# Split data
X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)
# Fit pipeline and transform
X_train_reduced = pipeline.fit_transform(X_train)
X_test_reduced = pipeline.transform(X_test)
Dimensionality Reduction Workflow
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
# Load iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Reduce to 2D for visualization
encdr = ENCDR(latent_dim=2, max_epochs=100, random_state=42)
X_2d = encdr.fit_transform(X)
# Plot results
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='viridis')
plt.title('ENCDR: Iris Dataset (2D)')
plt.xlabel('Component 1')
plt.ylabel('Component 2')
# Compare reconstruction quality
X_reconstructed = encdr.predict(X)
mse_per_feature = np.mean((X - X_reconstructed) ** 2, axis=0)
plt.subplot(1, 2, 2)
plt.bar(range(len(mse_per_feature)), mse_per_feature)
plt.title('Reconstruction Error by Feature')
plt.xlabel('Feature Index')
plt.ylabel('MSE')
plt.tight_layout()
plt.show()
API Reference
ENCDR Class
Parameters
- hidden_dims (list of int, default=[64, 32]): Hidden layer dimensions for encoder
- latent_dim (int, default=10): Dimension of latent space
- learning_rate (float, default=1e-3): Learning rate for optimization
- activation (str, default='relu'): Activation function ('relu', 'tanh', 'sigmoid', 'leaky_relu', 'elu', 'gelu')
- dropout_rate (float, default=0.0): Dropout rate for regularization
- weight_decay (float, default=0.0): L2 regularization weight decay
- batch_size (int, default=32): Training batch size
- max_epochs (int, default=100): Maximum training epochs
- validation_split (float, default=0.2): Fraction of data for validation
- standardize (bool, default=True): Whether to standardize features
- random_state (int, optional): Random seed for reproducibility
- trainer_kwargs (dict, optional): Additional PyTorch Lightning Trainer arguments
Methods
- fit(X, y=None): Train the autoencoder on data X
- transform(X): Transform data to latent representation
- fit_transform(X, y=None): Fit and transform in one step
- inverse_transform(X): Reconstruct data from latent representation
- predict(X): Reconstruct input data (alias for encode→decode)
- score(X, y=None): Return negative reconstruction MSE
- save(filepath): Save fitted model to disk
- load(filepath): Load saved model from disk (class method)
Model Persistence
ENCDR supports saving and loading trained models for later use. This includes all model parameters, weights, and preprocessing state (such as the fitted scaler).
Saving Models
# Train a model
encdr = ENCDR(hidden_dims=[64, 32], latent_dim=8, max_epochs=50)
encdr.fit(X_train)
# Save the trained model
encdr.save("my_model.pkl") # .pkl extension added automatically if not provided
encdr.save("/path/to/models/encdr_model") # Directories created automatically
Loading Models
# Load a previously saved model
loaded_encdr = ENCDR.load("my_model.pkl")
# The loaded model retains all functionality
X_transformed = loaded_encdr.transform(X_test)
X_reconstructed = loaded_encdr.predict(X_test)
score = loaded_encdr.score(X_test)
# All original parameters are preserved
print(f"Hidden dimensions: {loaded_encdr.hidden_dims}")
print(f"Latent dimension: {loaded_encdr.latent_dim}")
print(f"Model is fitted: {loaded_encdr.is_fitted_}")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file encdr-1.1.0.tar.gz.
File metadata
- Download URL: encdr-1.1.0.tar.gz
- Upload date:
- Size: 73.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d949e274506ce16f781549b962b2f7e147813b3301d868a6fa52f45f7b1b9e3a
|
|
| MD5 |
f623fa8d7dfba1ff50750bbd84270224
|
|
| BLAKE2b-256 |
2b71aede2a6d77dbb90f596deefa2d3f2f474fe36000b539e794d02cdef4f194
|
File details
Details for the file encdr-1.1.0-py3-none-any.whl.
File metadata
- Download URL: encdr-1.1.0-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4349547578568f2503c40e1192d510c33e33824e87fb800442687cfae43b41fb
|
|
| MD5 |
54216e9b873d408a4649a653bd5108be
|
|
| BLAKE2b-256 |
c301b4d1cfa23108f1fd21eeb1708e9731118520011f482a9221c6df36adb7cc
|