Skip to main content

High-Performance Random Forests with C++ backend (CPU-only, no CUDA required)

Project description

RFX: Random Forests X (CPU-Only Edition)

Python 3.8+ PyPI version License: MIT arXiv C++

RFX (Random Forests X) is a high-performance Python implementation of Breiman and Cutler's original Random Forest methodology with an optimized C++ back-end.

This is the CPU-only version (rfx-ml-cpu), built without CUDA dependencies, making it ideal for systems without GPUs or lightweight installations.

Note: For GPU acceleration and QLORA compression with large datasets, use the rfx-ml package instead.

Key Features

  • Complete classification: Out-of-bag error, confusion matrices, class probabilities
  • Local importance: Per-sample feature importance (similar to SHAP, built-in)
  • Proximity matrices: Pairwise sample similarities for outlier detection and visualization
  • CPU-optimized: Fast multi-threaded C++ implementation
  • No CUDA required: Works on any system without GPU dependencies
  • Interactive visualization: Python-native rfviz with 3D MDS and parallel coordinates

Installation

CPU-Only Version (this package):

pip install rfx-ml-cpu

GPU-Enabled Version (if you have CUDA):

pip install rfx-ml

Note: These packages are mutually exclusive. Both provide the rfx module. Choose based on your hardware:

  • CPU-only system or want minimal dependencies? → rfx-ml-cpu
  • Have a GPU and want acceleration? → rfx-ml

Prerequisites: CMake 3.12+, Python 3.8+, C++ compiler with C++17 support. No CUDA required!

The pip install command will automatically build from source. Make sure you have the prerequisites installed before running pip.

Quick Start

import numpy as np
import rfx as rf

# Load sample data
X, y = rf.load_wine()

# Train Random Forest (CPU-only)
model = rf.RandomForestClassifier(
    ntree=100,
    use_gpu=False,  # Required: CPU-only package
    compute_importance=True,
    compute_local_importance=True,
    compute_proximity=True
)

model.fit(X, y)

# Get predictions and metrics
oob_error = model.get_oob_error()
print(f"OOB Error: {oob_error:.4f}")

predictions = model.predict(X)
importance = model.feature_importances_()
local_imp = model.get_local_importance()

# Interactive visualization
rf.rfviz(
    rf_model=model,
    X=X,
    y=y,
    output_file="rfviz_example.html"
)

CPU Performance

RFX-CPU uses highly optimized multi-threaded C++ code:

# Automatic multi-threading (uses all CPU cores by default)
model = rf.RandomForestClassifier(
    ntree=500,
    use_gpu=False,
    n_threads_cpu=0,  # Auto-detect CPU cores
    compute_proximity=True
)

model.fit(X, y)

When to use this package:

  • No GPU available
  • Lightweight installation needed
  • Small to medium datasets (<50K samples)
  • CPU-only deployment environments

For large datasets (>50K samples) with proximity matrices, consider rfx-ml with GPU acceleration for significantly faster performance.

Switching Between Versions

Both packages use the same import rfx statement. To switch:

# Switch to GPU version
pip uninstall rfx-ml-cpu
pip install rfx-ml

Documentation

For complete documentation, examples, and advanced usage, visit:

License

MIT License - see LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfx_ml_cpu-1.0.2.tar.gz (304.1 kB view details)

Uploaded Source

File details

Details for the file rfx_ml_cpu-1.0.2.tar.gz.

File metadata

  • Download URL: rfx_ml_cpu-1.0.2.tar.gz
  • Upload date:
  • Size: 304.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for rfx_ml_cpu-1.0.2.tar.gz
Algorithm Hash digest
SHA256 be8be0dfe46909cefd267de0ce0fc1e458428eee75232b205e7a811c37e425ab
MD5 3f2a20c273a2fdf9a588bd2579c28651
BLAKE2b-256 e67253b029760f42f09f89b2c2ac66f42362447e24b5632a2a14985dd2ab2474

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page