Skip to main content

A Python library for autoencoder-based dimensionality reduction with a scikit-learn compatible interface.

Project description

ENCDR - Neural Component Dimensionality Reduction

A Python library for autoencoder-based dimensionality reduction with a scikit-learn compatible interface.

Features

  • Scikit-learn Compatible: Implements fit(), transform(), predict(), and fit_transform() methods
  • PyTorch Lightning Backend: Deep learning framework with automatic GPU support
  • Configurable Architecture: Customizable encoder/decoder layers, activation functions, and training parameters
  • Automatic Standardization: Optional feature scaling for improved training stability
  • Validation Support: Built-in train/validation splits for monitoring training progress
  • Multiple Activation Functions: Support for ReLU, Tanh, Sigmoid, LeakyReLU, ELU, and GELU
  • Model Persistence: Save and load trained models with full state preservation

Installation

uv add encdr
# or
pip install encdr

Dependencies

  • Python ≥ 3.12
  • PyTorch Lightning ≥ 2.5.5
  • scikit-learn ≥ 1.7.2
  • torch ≥ 2.0.0
  • numpy ≥ 1.21.0

Quick Start

from encdr import ENCDR
from sklearn.datasets import make_classification
import numpy as np

# Generate sample data
X, _ = make_classification(n_samples=1000, n_features=50, n_informative=30, random_state=42)

# Create and train autoencoder
encdr = ENCDR(
    hidden_dims=[64, 32, 16],  # Encoder layer sizes
    latent_dim=8,              # Bottleneck dimension
    max_epochs=50,             # Training epochs
    random_state=42
)

# Fit and transform data
X_reduced = encdr.fit_transform(X)
print(f"Original shape: {X.shape}, Reduced shape: {X_reduced.shape}")

# Reconstruct original data
X_reconstructed = encdr.predict(X)
reconstruction_error = np.mean((X - X_reconstructed) ** 2)
print(f"Reconstruction MSE: {reconstruction_error:.4f}")

# Save model for later use
encdr.save("my_autoencoder.pkl")

# Load model (in a different session)
loaded_encdr = ENCDR.load("my_autoencoder.pkl")
X_reduced_loaded = loaded_encdr.transform(X[:10])  # Same results as original model

Advanced Usage

Custom Architecture

encdr = ENCDR(
    hidden_dims=[128, 64, 32, 16],     # Deep architecture
    latent_dim=10,                     # 10D latent space
    activation="tanh",                 # Tanh activation
    dropout_rate=0.2,                  # 20% dropout for regularization
    learning_rate=1e-3,                # Learning rate
    weight_decay=1e-4,                 # L2 regularization
    batch_size=64,                     # Batch size
    max_epochs=100,                    # Training epochs
    validation_split=0.2,              # 20% validation split
    standardize=True,                  # Feature standardization
    random_state=42,                   # Reproducibility
    trainer_kwargs={"accelerator": "gpu", "devices": 1}  # GPU training
)

Integration with Scikit-learn Pipelines

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('encdr', ENCDR(latent_dim=5, max_epochs=50, standardize=False))
])

# Split data
X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)

# Fit pipeline and transform
X_train_reduced = pipeline.fit_transform(X_train)
X_test_reduced = pipeline.transform(X_test)

Dimensionality Reduction Workflow

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Reduce to 2D for visualization
encdr = ENCDR(latent_dim=2, max_epochs=100, random_state=42)
X_2d = encdr.fit_transform(X)

# Plot results
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='viridis')
plt.title('ENCDR: Iris Dataset (2D)')
plt.xlabel('Component 1')
plt.ylabel('Component 2')

# Compare reconstruction quality
X_reconstructed = encdr.predict(X)
mse_per_feature = np.mean((X - X_reconstructed) ** 2, axis=0)

plt.subplot(1, 2, 2)
plt.bar(range(len(mse_per_feature)), mse_per_feature)
plt.title('Reconstruction Error by Feature')
plt.xlabel('Feature Index')
plt.ylabel('MSE')

plt.tight_layout()
plt.show()

API Reference

ENCDR Class

Parameters

  • hidden_dims (list of int, default=[64, 32]): Hidden layer dimensions for encoder
  • latent_dim (int, default=10): Dimension of latent space
  • learning_rate (float, default=1e-3): Learning rate for optimization
  • activation (str, default='relu'): Activation function ('relu', 'tanh', 'sigmoid', 'leaky_relu', 'elu', 'gelu')
  • dropout_rate (float, default=0.0): Dropout rate for regularization
  • weight_decay (float, default=0.0): L2 regularization weight decay
  • batch_size (int, default=32): Training batch size
  • max_epochs (int, default=100): Maximum training epochs
  • validation_split (float, default=0.2): Fraction of data for validation
  • standardize (bool, default=True): Whether to standardize features
  • random_state (int, optional): Random seed for reproducibility
  • trainer_kwargs (dict, optional): Additional PyTorch Lightning Trainer arguments

Methods

  • fit(X, y=None): Train the autoencoder on data X
  • transform(X): Transform data to latent representation
  • fit_transform(X, y=None): Fit and transform in one step
  • inverse_transform(X): Reconstruct data from latent representation
  • predict(X): Reconstruct input data (alias for encode→decode)
  • score(X, y=None): Return negative reconstruction MSE
  • save(filepath): Save fitted model to disk
  • load(filepath): Load saved model from disk (class method)

Model Persistence

ENCDR supports saving and loading trained models for later use. This includes all model parameters, weights, and preprocessing state (such as the fitted scaler).

Saving Models

# Train a model
encdr = ENCDR(hidden_dims=[64, 32], latent_dim=8, max_epochs=50)
encdr.fit(X_train)

# Save the trained model
encdr.save("my_model.pkl")  # .pkl extension added automatically if not provided
encdr.save("/path/to/models/encdr_model")  # Directories created automatically

Loading Models

# Load a previously saved model
loaded_encdr = ENCDR.load("my_model.pkl")

# The loaded model retains all functionality
X_transformed = loaded_encdr.transform(X_test)
X_reconstructed = loaded_encdr.predict(X_test)
score = loaded_encdr.score(X_test)

# All original parameters are preserved
print(f"Hidden dimensions: {loaded_encdr.hidden_dims}")
print(f"Latent dimension: {loaded_encdr.latent_dim}")
print(f"Model is fitted: {loaded_encdr.is_fitted_}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

encdr-1.1.0.tar.gz (73.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

encdr-1.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file encdr-1.1.0.tar.gz.

File metadata

  • Download URL: encdr-1.1.0.tar.gz
  • Upload date:
  • Size: 73.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for encdr-1.1.0.tar.gz
Algorithm Hash digest
SHA256 d949e274506ce16f781549b962b2f7e147813b3301d868a6fa52f45f7b1b9e3a
MD5 f623fa8d7dfba1ff50750bbd84270224
BLAKE2b-256 2b71aede2a6d77dbb90f596deefa2d3f2f474fe36000b539e794d02cdef4f194

See more details on using hashes here.

File details

Details for the file encdr-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: encdr-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for encdr-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4349547578568f2503c40e1192d510c33e33824e87fb800442687cfae43b41fb
MD5 54216e9b873d408a4649a653bd5108be
BLAKE2b-256 c301b4d1cfa23108f1fd21eeb1708e9731118520011f482a9221c6df36adb7cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page