Skip to main content

High-Performance Random Forests with GPU Acceleration and QLORA Compression

Project description

RFX: Random Forests X

License: MIT Python 3.7+

RFX (Random Forests X) is a high-performance Python implementation of Breiman and Cutler's original Random Forest methodology with GPU acceleration and QLORA compression.

Key Features

  • Complete classification: Out-of-bag error, confusion matrices, class probabilities
  • Local importance: Per-sample feature importance (similar to SHAP, built-in)
  • Proximity matrices: Pairwise sample similarities for outlier detection and visualization
  • QLORA compression: 12,500× memory reduction (80GB → 6.4MB) for large-scale proximity analysis
  • Full GPU acceleration: CUDA for trees, importance, and proximity matrices
  • Interactive visualization: Python-native rfviz with 3D MDS and parallel coordinates

Result: Proximity-based workflows now scale to 200K–1M+ samples.

Installation

# Basic installation
pip install rfx-ml

# With visualization dependencies
pip install rfx-ml[viz]

# With all optional dependencies
pip install rfx-ml[viz,examples]

Prerequisites: CMake 3.12+, Python 3.7+, CUDA toolkit 11.0+ (required for building; GPU usage optional at runtime), C++ compiler with C++17 support.

The pip install command will automatically build from source. Make sure you have the prerequisites installed before running pip.

Quick Start

import numpy as np
import RFX as rf

# Load sample data
X, y = rf.load_wine()

# Train Random Forest
model = rf.RandomForestClassifier(
    ntree=100,
    compute_importance=True,
    compute_local_importance=True,
    compute_proximity=True,
    use_gpu=False  # Set to True for GPU acceleration
)

model.fit(X, y)

# Get predictions and metrics
oob_error = model.get_oob_error()
print(f"OOB Error: {oob_error:.4f}")

predictions = model.predict(X)
importance = model.feature_importances_()
local_imp = model.get_local_importance()

# Interactive visualization
rf.rfviz(
    rf_model=model,
    X=X,
    y=y,
    output_file="rfviz_example.html"
)

GPU Acceleration & QLORA

For large datasets, enable GPU acceleration and QLORA compression:

# Large-scale proximity analysis with QLORA
model = rf.RandomForestClassifier(
    ntree=500,
    use_gpu=True,
    compute_proximity=True,
    use_qlora=True,
    rank=32,  # Low-rank approximation
    quant_mode="int8"
)

model.fit(X, y)

# Get low-rank factors (memory efficient)
A, B, rank = model.get_lowrank_factors()

# Compute MDS directly from factors (no reconstruction!)
mds_coords = model.compute_mds_from_factors(k=3)

Memory savings: 100K samples: 74.5 GB (full matrix) → 19 MB (QLORA rank-100) = 4000× compression.

Documentation

For complete documentation, examples, and advanced usage, visit:

License

MIT License - see LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfx_ml-1.0.1.tar.gz (303.1 kB view details)

Uploaded Source

File details

Details for the file rfx_ml-1.0.1.tar.gz.

File metadata

  • Download URL: rfx_ml-1.0.1.tar.gz
  • Upload date:
  • Size: 303.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for rfx_ml-1.0.1.tar.gz
Algorithm Hash digest
SHA256 328d0c7062e404982bc7d83dca6fe6c06dc1d0e2bc4515801f2fe8096e969b54
MD5 4f2f26b33a0912ac1ad8734238c524d4
BLAKE2b-256 d9bfda4e91446acde5e87550baae897e07d30453509e7f4101df87cb70a14370

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page