GPU-accelerated Incremental PCA using PyTorch, with sklearn-compatible API

These details have not been verified by PyPI

Project links

Project description

Incremental PCA for PyTorch

Incremental Principal Component Analysis (PCA) using PyTorch: This package provides a scikit-learn compatible API for performing PCA. This allows for PCA to be performed on datasets that are too large to fit in memory.

Features

GPU Acceleration: Perform PCA on GPUs for significant speedups on large datasets
Memory Efficient: Process data in batches to handle datasets larger than available RAM/VRAM ("out of core")
sklearn Compatible: Drop-in replacement with familiar fit, transform, fit_transform API
Streaming Support: Use partial_fit for online learning from data streams
Lazy arrays / numpy memmap Support: Efficiently process arrays on disk and memory-mapped files

Installation

pip install incremental-pca-torch

From Source

git clone https://github.com/RichieHakim/incremental_pca.git
cd incremental_pca
pip install -e ".[dev]"

Quick Start

import numpy as np
from incremental_pca_torch import IncrementalPCA

# Create some data
X = np.random.randn(10000, 500).astype(np.float32)

# Fit incrementally using GPU
ipca = IncrementalPCA(
    n_components=50, 
    batch_size=256, 
    device='cuda'  # Use 'cpu' if no GPU available
)
ipca.fit(X)

# Transform new data
X_transformed = ipca.transform(X)
print(f"Reduced shape: {X_transformed.shape}")  # (10000, 50)

# Reconstruct data
X_reconstructed = ipca.inverse_transform(X_transformed)

Streaming Data with `partial_fit`

# For streaming or very large datasets
ipca = IncrementalPCA(n_components=50, device='cuda')

# Process data in chunks
for chunk in data_generator():
    ipca.partial_fit(chunk)

# Use the fitted model
X_transformed = ipca.transform(new_data)

Using with Memory-Mapped Arrays

import numpy as np

# Memory-mapped files work seamlessly
X_mmap = np.load('large_data.npy', mmap_mode='r')

ipca = IncrementalPCA(n_components=50, batch_size=256, device='cuda')
ipca.fit(X_mmap)  # Loads only one batch at a time

API Reference

`IncrementalPCA`

IncrementalPCA(
    n_components=None,     # Number of components (default: min(n_samples, n_features))
    whiten=False,          # Scale components to unit variance
    batch_size=128,        # Samples per batch for fit/transform
    device='cpu',          # 'cpu', 'cuda', 'cuda:0', 'mps', etc.
    dtype=torch.float32,   # torch.float32 or torch.float64
    whiten_eps=1e-7,       # Numerical stability for whitening
    verbose=False,         # Show progress bars
)

Methods

Method	Description
`fit(X)`	Fit the model to data X in batches
`partial_fit(X)`	Incrementally update model with a single batch
`transform(X)`	Project data onto principal components
`inverse_transform(X)`	Reconstruct data from components
`fit_transform(X)`	Fit and transform in one call

Attributes (after fitting)

Attribute	Description
`components_`	Principal axes, shape `(n_components, n_features)`
`mean_`	Per-feature mean, shape `(n_features,)`
`explained_variance_`	Variance per component
`explained_variance_ratio_`	Fraction of total variance per component
`n_samples_seen_`	Total samples processed

Benchmarks

Benchmarks comparing against sklearn.decomposition.IncrementalPCA on CPU.

Configuration: 10,000 samples × 500 features → 50 components

Fit Performance

Batch Size	Torch (s)	sklearn (s)	Speedup
64	0.708	0.663	0.94x
128	0.581	0.579	1.00x
256	0.670	0.612	0.91x
512	0.699	0.633	0.91x
1024	0.585	0.548	0.94x
2048	0.535	0.480	0.90x

Transform Performance

Batch Size	Torch (s)	sklearn (s)	Speedup
64	0.008	0.028	3.64x
512	0.011	0.028	2.47x
1024	0.007	0.028	3.72x
2048	0.013	0.028	2.08x

Note: On CPU, performance is comparable to sklearn. The main advantage of this package is GPU acceleration, which provides significant speedups for large datasets.

Algorithm

This implementation uses the incremental SVD algorithm from Ross et al. (2008), which:

Updates running statistics using Welford's algorithm for numerically stable online mean and variance computation
Constructs an augmented matrix combining previous components with new centered data
Performs SVD on the augmented matrix to update components
Applies deterministic sign flipping for reproducibility

The algorithm matches sklearn's IncrementalPCA implementation exactly (verified via comprehensive test suite).

Testing

Run the test suite:

pytest tests/ -v

The test suite includes:

Comparison against sklearn PCA (full-batch mode)
Comparison against sklearn IncrementalPCA (various batch sizes)
Batch size sensitivity tests
Whitening tests
Numerical stability tests
Edge case handling

License

MIT License - see LICENSE for details.

References

Ross, D. A., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1), 125-141.
scikit-learn IncrementalPCA documentation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jan 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

incremental_pca_torch-0.1.0.tar.gz (19.0 kB view details)

Uploaded Jan 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

incremental_pca_torch-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded Jan 7, 2026 Python 3

File details

Details for the file incremental_pca_torch-0.1.0.tar.gz.

File metadata

Download URL: incremental_pca_torch-0.1.0.tar.gz
Upload date: Jan 7, 2026
Size: 19.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for incremental_pca_torch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6cff1a1b4fc5571ab61d3c88f4029f71a8a81ef05baae9c9804045019e0e448a`
MD5	`74fd38924e9117a8119984ee03b2e38b`
BLAKE2b-256	`b1801902f03564642810cfceb1d91b6b5cb834af416185e6c1cbb090fd98fde1`

See more details on using hashes here.

File details

Details for the file incremental_pca_torch-0.1.0-py3-none-any.whl.

File metadata

Download URL: incremental_pca_torch-0.1.0-py3-none-any.whl
Upload date: Jan 7, 2026
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for incremental_pca_torch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4d871f62dc3d18f84a089566398d63f08be785d8b4746e42b91b5f638fe7d921`
MD5	`dd9ffdcb3be5a6a8e2be111be258ab55`
BLAKE2b-256	`dd99e1dcbffd1b42993332e0744acb8667347b8c5cb8d8a51e8fde3db2cb20e1`

See more details on using hashes here.

incremental-pca-torch 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Incremental PCA for PyTorch

Features

Installation

From Source

Quick Start

Streaming Data with partial_fit

Using with Memory-Mapped Arrays

API Reference

IncrementalPCA

Methods

Attributes (after fitting)

Benchmarks

Fit Performance

Transform Performance

Algorithm

Testing

License

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Streaming Data with `partial_fit`

`IncrementalPCA`