Skip to main content

High-Performance Random Forests with GPU Acceleration and QLORA Compression

Project description

RFX: Random Forests X

Python 3.8+ PyPI version License: MIT arXiv C++ CUDA

RFX (Random Forests X) is a high-performance Python implementation of Breiman and Cutler's original Random Forest methodology with GPU acceleration and QLORA compression.

Key Features

  • Complete classification: Out-of-bag error, confusion matrices, class probabilities
  • Local importance: Per-sample feature importance (similar to SHAP, built-in)
  • Proximity matrices: Pairwise sample similarities for outlier detection and visualization
  • QLORA compression: 12,500× memory reduction (80GB → 6.4MB) for large-scale proximity analysis
  • Full GPU acceleration: CUDA for trees, importance, and proximity matrices
  • Interactive visualization: Python-native rfviz with 3D MDS and parallel coordinates

Result: Proximity-based workflows now scale to 200K–1M+ samples.

Installation

GPU-Enabled Version (supports both GPU and CPU fallback):

pip install rfx-ml

CPU-Only Version (lightweight, no CUDA dependencies):

pip install rfx-ml-cpu

Note: These packages are mutually exclusive. Both provide the rfx module. Choose based on your hardware:

  • Have a GPU and want acceleration? → rfx-ml
  • CPU-only system or want minimal dependencies? → rfx-ml-cpu

Prerequisites: CMake 3.12+, Python 3.7+, CUDA toolkit 11.0+ (required for building; GPU usage optional at runtime), C++ compiler with C++17 support.

The pip install command will automatically build from source. Make sure you have the prerequisites installed before running pip.

Quick Start

import numpy as np
import rfx as rf

# Load sample data
X, y = rf.load_wine()

# Train Random Forest
model = rf.RandomForestClassifier(
    ntree=100,
    compute_importance=True,
    compute_local_importance=True,
    compute_proximity=True,
    use_gpu=False  # Set to True for GPU acceleration
)

model.fit(X, y)

# Get predictions and metrics
oob_error = model.get_oob_error()
print(f"OOB Error: {oob_error:.4f}")

predictions = model.predict(X)
importance = model.feature_importances_()
local_imp = model.get_local_importance()

# Interactive visualization
rf.rfviz(
    rf_model=model,
    X=X,
    y=y,
    output_file="rfviz_example.html"
)

GPU Acceleration & QLORA

For large datasets, enable GPU acceleration and QLORA compression:

# Large-scale proximity analysis with QLORA
model = rf.RandomForestClassifier(
    ntree=500,
    use_gpu=True,
    compute_proximity=True,
    use_qlora=True,
    rank=32,  # Low-rank approximation
    quant_mode="int8"
)

model.fit(X, y)

# Get low-rank factors (memory efficient)
A, B, rank = model.get_lowrank_factors()

# Compute MDS directly from factors (no reconstruction!)
mds_coords = model.compute_mds_from_factors(k=3)

Memory savings: 100K samples: 74.5 GB (full matrix) → 19 MB (QLORA rank-100) = 4000× compression.

Documentation

For complete documentation, examples, and advanced usage, visit:

License

MIT License - see LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfx_ml-1.0.2.tar.gz (303.6 kB view details)

Uploaded Source

File details

Details for the file rfx_ml-1.0.2.tar.gz.

File metadata

  • Download URL: rfx_ml-1.0.2.tar.gz
  • Upload date:
  • Size: 303.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for rfx_ml-1.0.2.tar.gz
Algorithm Hash digest
SHA256 57b05ae188b5c4faaced80eae3925024858ecbc0adea8ffd8d0f886eaf3d2f48
MD5 10786b3c5cf00f694bcebf2920e699dc
BLAKE2b-256 36a000c51b9d4fd190e4fd653b9b421c772b962941e52da395ecfdd3ec24d6e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page