Skip to main content

SVD, PCA, and matrix decomposition from first principles

Project description

pyspectral

SVD, PCA, and matrix decomposition — implemented from scratch in Python.

No np.linalg.svd. No sklearn.decomposition.PCA. Every algorithm is built from first principles using only NumPy and SciPy as a numerical substrate.


What it does

Starting from power iteration on a random vector, the library builds up a full stack:

power iteration
  → eigen decomposition (Hotelling deflation)
    → full SVD
      → truncated SVD (exact + randomized)
        → PCA
          → image compression
          → video compression
          → eigenfaces (face recognition)

It's a learning/research library. The goal is algorithmic clarity, not LAPACK performance.

For the full writeup on how it was built and why — read the blog.


Install

pip install -e .

pip install -e ".[video]" # video compression (imageio)
pip install -e ".[notebook]" # Jupyter support
pip install -e ".[dev]" # pytest

Structure

pyspectral/
  linalg_core/     power_iteration.py, eigen_decomp.py, svd_full.py, svd_truncated.py
  pca/             covariance.py, pca_model.py
  applications/    image_compression.py, eigenfaces.py, video_compression.py
  benchmarking/    time_complexity.py, memory_profile.py
  experiments/     large_matrix_tests.py
  utils/           matrix_checks.py
  tests/           test_power_iteration.py, test_svd.py, test_pca.py

API

Full SVD

import numpy as np
from pyspectral.linalg_core.svd_full import compute_svd

A = np.random.randn(100, 80)
U, S, Vt = compute_svd(A)

A_rec = U[:, :len(S)] @ np.diag(S) @ Vt[:len(S), :]

Truncated SVD

from pyspectral.linalg_core.svd_truncated import truncated_svd

Uk, Sk, Vkt = truncated_svd(A, k=10)                        # exact
Uk, Sk, Vkt = truncated_svd(A, k=10, method="randomized")   # Halko 2011, much faster

A_approx = Uk @ np.diag(Sk) @ Vkt

PCA

from pyspectral.pca.pca_model import PCAEngine

pca = PCAEngine(n_components=10)
pca.fit(X)                          # X: (n_samples, n_features)

Z    = pca.transform(X)             # -> (n, 10)
Xhat = pca.inverse_transform(Z)     # -> (n, n_features)

print(pca.explained_variance_ratio_)

k, _ = pca.explained_variance(threshold=0.90)
print(f"{k} components explain 90% variance")

Use method="svd" when features >> samples (avoids building the p×p covariance matrix):

pca = PCAEngine(n_components=50, method="svd")

Image compression

from pyspectral.applications.image_compression import ImageCompressor

c = ImageCompressor("photo.png")
c.compress_and_compare(ranks=[5, 20, 50, 100])
c.print_report()
c.save_results("output/")

Output:

Rank k    Ratio     PSNR (dB)
   5      55.2x     17.12
  20      13.8x     22.28
  50       5.5x     27.17
 100       2.8x     31.98

SVD compression is not competitive with JPEG at equal file size (~30 dB gap). It stores raw float32 with no entropy coding. Useful for scientific/numerical matrices — not for replacing image codecs.

Video compression

from pyspectral.applications.video_compression import VideoCompressor

vc = VideoCompressor()
vc.load_synthetic(kind="wave")          # "wave", "bouncing", "noise", "mixed"

vc.compress_frame_by_frame(k=5)        # rank-k SVD per frame
vc.compress_temporal(k=5)             # global SVD across all frames (T, H*W)

vc.save_results("output/video/")

Eigenfaces

from pyspectral.applications.eigenfaces import EigenfacesModel

model = EigenfacesModel(n_components=30, image_size=(32, 32))
model.train_synthetic(n_subjects=10, n_images_per_subject=8)

label, dist, _ = model.recognize_face(query_image)
accuracy, _    = model.evaluate_accuracy()           # leave-one-out CV
model.visualize_eigenfaces(n_show=16, output_path="eigenfaces.png")

Real face dataset:

model = EigenfacesModel(n_components=50, image_size=(112, 92))
model.train_eigenfaces("path/to/dataset/")   # one subfolder per person

Matrix utilities

from pyspectral.utils.matrix_checks import matrix_info, condition_number, is_symmetric

matrix_info(A)       # shape, norms, rank, condition number
condition_number(A)  # sigma_max / sigma_min
is_symmetric(A)

Running demos and tests

python -m pyspectral.applications.image_compression
python -m pyspectral.applications.eigenfaces
python -m pyspectral.applications.video_compression
python -m pyspectral.benchmarking.time_complexity
python -m pyspectral.experiments.large_matrix_tests

python -m pytest pyspectral/tests/ -p no:asyncio -q

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scipyspectral-1.0.0.tar.gz (48.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scipyspectral-1.0.0-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file scipyspectral-1.0.0.tar.gz.

File metadata

  • Download URL: scipyspectral-1.0.0.tar.gz
  • Upload date:
  • Size: 48.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for scipyspectral-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4670b5ebd80dd921c9fb9c7f74479f9afb336fc1a7586599b7a7397a79f9f61e
MD5 fceeacba0bba5ef172a2a234aab94953
BLAKE2b-256 472f19b6e44d55c7eba8fb1846983e4c51afa80071d6a67f1e38759298c21cbc

See more details on using hashes here.

File details

Details for the file scipyspectral-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: scipyspectral-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 52.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for scipyspectral-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 720342c69ad0069ef11af6c300801f5475e3f4a1e20caadd713db235c8df7f81
MD5 e6ed8574753c318afb5770198cba44ce
BLAKE2b-256 eccdbf67cdcde564924314373205174b8c6e00e8562f0f74bed8ebcf6654df9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page