High-Performance Random Forests with GPU Acceleration and QLORA Compression
Project description
RFX: Random Forests X
RFX (Random Forests X) is a high-performance Python implementation of Breiman and Cutler's original Random Forest methodology with GPU acceleration and QLORA compression.
Key Features
- Complete classification: Out-of-bag error, confusion matrices, class probabilities
- Local importance: Per-sample feature importance (similar to SHAP, built-in)
- Proximity matrices: Pairwise sample similarities for outlier detection and visualization
- QLORA compression: 12,500× memory reduction (80GB → 6.4MB) for large-scale proximity analysis
- Full GPU acceleration: CUDA for trees, importance, and proximity matrices
- Interactive visualization: Python-native rfviz with 3D MDS and parallel coordinates
Result: Proximity-based workflows now scale to 200K–1M+ samples.
Installation
# Basic installation
pip install rfx-ml
# With visualization dependencies
pip install rfx-ml[viz]
# With all optional dependencies
pip install rfx-ml[viz,examples]
Prerequisites: CMake 3.12+, Python 3.7+, CUDA toolkit 11.0+ (required for building; GPU usage optional at runtime), C++ compiler with C++17 support.
The pip install command will automatically build from source. Make sure you have the prerequisites installed before running pip.
Quick Start
import numpy as np
import RFX as rf
# Load sample data
X, y = rf.load_wine()
# Train Random Forest
model = rf.RandomForestClassifier(
ntree=100,
compute_importance=True,
compute_local_importance=True,
compute_proximity=True,
use_gpu=False # Set to True for GPU acceleration
)
model.fit(X, y)
# Get predictions and metrics
oob_error = model.get_oob_error()
print(f"OOB Error: {oob_error:.4f}")
predictions = model.predict(X)
importance = model.feature_importances_()
local_imp = model.get_local_importance()
# Interactive visualization
rf.rfviz(
rf_model=model,
X=X,
y=y,
output_file="rfviz_example.html"
)
GPU Acceleration & QLORA
For large datasets, enable GPU acceleration and QLORA compression:
# Large-scale proximity analysis with QLORA
model = rf.RandomForestClassifier(
ntree=500,
use_gpu=True,
compute_proximity=True,
use_qlora=True,
rank=32, # Low-rank approximation
quant_mode="int8"
)
model.fit(X, y)
# Get low-rank factors (memory efficient)
A, B, rank = model.get_lowrank_factors()
# Compute MDS directly from factors (no reconstruction!)
mds_coords = model.compute_mds_from_factors(k=3)
Memory savings: 100K samples: 74.5 GB (full matrix) → 19 MB (QLORA rank-100) = 4000× compression.
Documentation
For complete documentation, examples, and advanced usage, visit:
- GitHub: https://github.com/chriskuchar/RFX
- Full README: https://github.com/chriskuchar/RFX/blob/main/README.md
License
MIT License - see LICENSE file for details.
Links
- Source Code: https://github.com/chriskuchar/RFX
- Bug Reports: https://github.com/chriskuchar/RFX/issues
- Documentation: https://github.com/chriskuchar/RFX/blob/main/README.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file rfx_ml-1.0.1.tar.gz.
File metadata
- Download URL: rfx_ml-1.0.1.tar.gz
- Upload date:
- Size: 303.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
328d0c7062e404982bc7d83dca6fe6c06dc1d0e2bc4515801f2fe8096e969b54
|
|
| MD5 |
4f2f26b33a0912ac1ad8734238c524d4
|
|
| BLAKE2b-256 |
d9bfda4e91446acde5e87550baae897e07d30453509e7f4101df87cb70a14370
|