Skip to main content

Approximate Matrix Multiplication algorithms with unified interface

Project description

SAGE-AMMS - Approximate Matrix Multiplication Algorithms

PyPI version Build Status License

Independent C++ implementation package for Approximate Matrix Multiplication (AMM) algorithms, extracted from the SAGE project.

Status: ๐Ÿš€ Active Development PyPI Package: isage-amms Parent Project: SAGE - Unified ML System

Overview

SAGE-AMMS provides high-performance C++ implementations of various approximate matrix multiplication algorithms with Python bindings. This package was extracted from the main SAGE repository to:

  • โœ… Enable independent versioning and releases
  • โœ… Reduce main SAGE repository size
  • โœ… Allow optional installation
  • โœ… Simplify CI/CD for C++ builds
  • โœ… Make AMM algorithms reusable in other projects

Quick Start

Installation

From PyPI (Recommended)

pip install isage-amms

From Source

Prerequisites:

  • CMake >= 3.14
  • C++14 compatible compiler
  • Python >= 3.8
  • PyTorch >= 2.0.0
# Clone the repository
git clone https://github.com/intellistream/sage-amms.git
cd sage-amms

# Install in development mode
pip install -e .

# Or build wheel
python -m build --wheel
pip install dist/*.whl

Usage

This package provides the algorithm implementations. The unified interface is provided by the main SAGE package:

# Install both packages
# pip install sage isage-amms

# Import from SAGE (provides the interface)
from sage.libs.amms import create, registered

# Check available algorithms
print(registered())
# Output: ['countsketch', 'fastjlt', 'crs', 'bcrs', ...]

# Create an algorithm instance
amm = create("countsketch", sketch_size=1000)

# Perform approximate matrix multiplication
import numpy as np
A = np.random.randn(1000, 500)
B = np.random.randn(500, 800)
result = amm.multiply(A, B)  # Approximate A @ B

Architecture

See ARCHITECTURE.md for detailed architecture documentation.

AMMS provides a unified interface for various approximate matrix multiplication algorithms, similar to how ANNS provides a unified interface for approximate nearest neighbor search algorithms.

Structure

sage-amms/
โ”œโ”€โ”€ sage/libs/amms/
โ”‚   โ”œโ”€โ”€ __init__.py              # Package initialization
โ”‚   โ”œโ”€โ”€ implementations/         # C++ source code
โ”‚   โ”‚   โ”œโ”€โ”€ include/             # C++ headers
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ CPPAlgos/        # Core AMM algorithm implementations
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ MatrixLoader/    # Matrix loading utilities
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ Utils/           # Utility functions
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ”œโ”€โ”€ src/                 # C++ implementation files
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ CPPAlgos/        # Algorithm implementations
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ PyAMM.cpp        # Python bindings
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ””โ”€โ”€ CMakeLists.txt       # Build configuration
โ”‚   โ””โ”€โ”€ wrappers/                # Python wrappers
โ”‚       โ””โ”€โ”€ pyamm.py             # PyAMM wrapper
โ”œโ”€โ”€ tests/                       # Unit tests
โ”œโ”€โ”€ pyproject.toml               # Package metadata
โ””โ”€โ”€ setup.py                     # Build configuration

Algorithms Included

This package provides implementations of various AMM algorithms:

Sketching-based Algorithms

  • CountSketch: Count-Min Sketch based AMM
  • FastJLT: Fast Johnson-Lindenstrauss Transform
  • RIP: Random Index Projection
  • TugOfWar: Tug-of-war sketch

Sampling-based Algorithms

  • CRS: Coordinate-wise Random Sampling
  • CRSV2: Improved CRS
  • BCRS: Block-wise CRS
  • EWS: Entry-wise Sampling

Quantization-based Algorithms

  • ProductQuantization: Product quantization for AMM
  • VectorQuantization: Vector quantization
  • INT8: 8-bit integer quantization

Advanced Algorithms

  • CoOccurringFD: Co-occurring Feature Detection
  • BetaCoOFD: Beta Co-occurring Feature Detection
  • BlockLRA: Block Low-Rank Approximation
  • CLMM: Clustered Low-rank Matrix Multiplication
  • SMPCA: Symmetric Matrix PCA
  • WeightedCR: Weighted Cross-Ranking

Installation

From PyPI (Recommended)

# Install CPU-only version
pip install isage-amms

This installs the CPU-only version with all core AMM algorithms.

From Source

Prerequisites:

  • CMake >= 3.14
  • C++14 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
  • Python 3.8-3.12
  • PyTorch >= 2.0.0
  • 64GB+ RAM recommended for building
# Clone repository
git clone https://github.com/intellistream/sage-amms.git
cd sage-amms

# CPU-only build
pip install -e .

# CUDA-enabled build
AMMS_ENABLE_CUDA=1 pip install -e .

# Explicitly disable CUDA
AMMS_ENABLE_CUDA=0 pip install -e .

Build Options

CUDA Support

AMMS_ENABLE_CUDA is an explicit switch. Use 1 to enable CUDA and 0 to force CPU build.

# Enable CUDA
AMMS_ENABLE_CUDA=1 pip install isage-amms --no-binary :all:

# Force CPU-only build
AMMS_ENABLE_CUDA=0 pip install isage-amms --no-binary :all:

# Specify CUDA path
CUDA_HOME=/usr/local/cuda AMMS_ENABLE_CUDA=1 pip install isage-amms --no-binary :all:

Low Memory Build

Build mode is now selected automatically by memory probe:

  • If available memory >= AMMS_FAST_BUILD_MEMORY_GB (default 48), fast build is enabled.
  • Otherwise low-memory mode is enabled to reduce OOM risk.
# Default behavior (auto memory probe)
pip install -e .

You can still set it explicitly:

AMMS_LOW_MEMORY_BUILD=1 pip install isage-amms --no-binary :all:

If you have enough RAM and want faster compilation:

AMMS_FAST_BUILD=1 AMMS_MAX_JOBS=4 pip install -e .

Adjust auto fast-build threshold:

AMMS_FAST_BUILD_MEMORY_GB=64 pip install -e .

instructions.

# Navigate to amms directory
cd packages/sage-libs/src/sage/libs/amms

# Quick build
./quick_build.sh

# Or use the full build script with options
./publish_to_pypi.sh --build-only --low-memory

# Install locally
pip install dist/isage_amms-*.whl

Build Options

The build system supports various options:

# Low-memory build (default)
export AMMS_LOW_MEMORY_BUILD=1

# Default parallelism is 1 job in low-memory mode
export AMMS_MAX_JOBS=1

# Enable CUDA support
export AMMS_ENABLE_CUDA=1
export CUDA_HOME=/usr/local/cuda

# Override parallel jobs when memory is sufficient
export AMMS_MAX_JOBS=4

Build and Publish to PyPI

For maintainers who want to build and publish to PyPI:

# Build only (dry-run, no upload)
./publish_to_pypi.sh

# Build and upload to TestPyPI
./publish_to_pypi.sh --test-pypi --no-dry-run

# Build and upload to PyPI (production)
./publish_to_pypi.sh --no-dry-run

# With CUDA and low-memory options
./publish_to_pypi.sh --cuda --low-memory --no-dry-run

See BUILD_PUBLISH.md for comprehensive build and publish documentation.

Usage

Using the Unified Interface

from sage.libs.amms import create_amm_index

# Create an AMM index using the factory
amm = create_amm_index("countsketch", config={
    "sketch_size": 1000,
    "hash_functions": 5
})

# Perform approximate matrix multiplication
result = amm.multiply(matrix_a, matrix_b)

Direct Algorithm Usage

from sage.libs.amms.wrappers.pyamm import PyAMM

# Create a specific AMM algorithm instance
amm = PyAMM.CountSketch(sketch_size=1000)

# Use the algorithm
result = amm.multiply(matrix_a, matrix_b)

Benchmarking

For benchmarking AMM algorithms, see the sage-benchmark package:

# Run AMM benchmarks
sage-dev benchmark amm --algorithms countsketch,fastjlt --datasets dataset1

See packages/sage-benchmark/src/sage/benchmark/benchmark_libamm/README.md for details.

Issue #6 regression checks

# Build-path matrix + perf baseline regression
pytest -q tests/test_issue6_build_matrix_and_perf_baseline.py

# CUDA/CPU switch cleanup regression
pytest -q tests/test_issue5_cuda_cpu_switch_cleanup.py

Migration from libamm

This module is refactored from the original libamm submodule:

  • Algorithm implementations: Moved from libamm/include/CPPAlgos and libamm/src/CPPAlgos to amms/implementations/
  • Benchmarking code: Moved from libamm/benchmark/ to sage-benchmark/benchmark_libamm/
  • Python bindings: Refactored into amms/wrappers/pyamm/
  • Interface layer: New unified interface similar to ANNS

Architecture Alignment

AMMS follows SAGE's architecture principles:

  • Layer 3 (L3-libs): Algorithm implementations and interfaces
  • Separation of concerns: Core algorithms (amms/) vs benchmarking (benchmark_libamm/)
  • Unified interfaces: Factory pattern for algorithm creation
  • Modular design: Independent wrappers for different algorithm families

References

  • Original LibAMM paper and documentation
  • PyTorch integration guide
  • AMM algorithm theory and applications

Contributing

When adding new AMM algorithms:

  1. Add C++ implementation to implementations/include/CPPAlgos/ and implementations/src/CPPAlgos/
  2. Create Python wrapper in wrappers/
  3. Register algorithm in interface/registry.py
  4. Add tests in sage-libs/tests/
  5. Add benchmark configuration in sage-benchmark/benchmark_libamm/

See CONTRIBUTING.md at project root for detailed guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isage_amms-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

File details

Details for the file isage_amms-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for isage_amms-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 3f372a079dc672226bff4f5b943615882284a51aa245c7e8be3022b455f30136
MD5 d716928f2f849f6b4389b64f48b1bc77
BLAKE2b-256 6c733ee2af7353d8539e777fae089fefb03eccf7aea80d2dfab309795b190805

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page