Geometry-aware random forest with HVRT-powered generative diversity
Project description
GeoRF — Geometry-Aware Random Forest
GeoRF replaces bootstrap resampling with HVRT-powered generative diversity. Each tree trains on a completely unique synthetic dataset drawn from learned per-partition kernel density estimates. No tree ever sees a real sample. No two trees share a single training point.
Why?
Bootstrap bagging has a diversity ceiling. With n = 250 samples, each bootstrap draw contains ≈158 unique samples. GeoRF removes this ceiling: 100 trees × 500 samples = 50 000 unique synthetic training points.
See benchmark/results/ for full benchmark results
comparing GeoRF against Random Forest, Gradient Boosting, XGBoost, LightGBM,
and MLP (sklearn + PyTorch) on standard datasets. Run the benchmarks yourself:
cd benchmark
pip install -r requirements.txt
python run_classification.py
python run_regression.py
Install
pip install -e . # editable
# or
pip install georf
Requirements: Python ≥ 3.10, hvrt >= 2.3.0, scikit-learn, numpy, joblib
Quick start
from georf import GeoRFClassifier, GeoRFRegressor
# Classification
clf = GeoRFClassifier(n_estimators=100, n_samples_per_tree=500, n_jobs=-1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_proba = clf.predict_proba(X_test) # (n_samples, n_classes)
# Regression
reg = GeoRFRegressor(n_estimators=100, n_samples_per_tree=500, n_jobs=-1)
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)
# Interpretability
clf.feature_importances(feature_names=cols) # dict, sorted descending
clf.tree_quality_scores() # per-tree AUC array
clf.diversity_score() # float: pairwise disagreement rate
clf.provenance() # dataset / expansion metadata
Parameters
| Parameter | Default | Description |
|---|---|---|
n_estimators |
100 | Number of trees |
n_samples_per_tree |
500 | Synthetic samples per tree |
max_depth |
6 | Max tree depth |
min_samples_leaf |
5 | Min samples per leaf |
max_features |
None | Feature subsampling (None = all) |
bandwidth |
'auto' |
HVRT KDE bandwidth ('auto' = per-partition auto-selection) |
n_jobs |
None | Workers (-1 = all cores) |
random_state |
42 | Reproducibility seed |
Running tests
pip install pytest
pytest tests/
# exclude slow timing test:
pytest tests/ -m "not slow"
License
AGPL-3.0-or-later
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file georf-0.1.0.tar.gz.
File metadata
- Download URL: georf-0.1.0.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d88015df22eabb6db1b837d2cea0a67e1a052acaf4bd4db4d57c27d2498a6e06
|
|
| MD5 |
1ad2e1998b056cf6d5524c979006fe1b
|
|
| BLAKE2b-256 |
e7781424952f3c81ba1bd3b773c82952b6098873215a0bbff203209d3a632e42
|
File details
Details for the file georf-0.1.0-py3-none-any.whl.
File metadata
- Download URL: georf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abd2ba5bf5ef0281e8980186b0ae55357314a9819f36f50769387b417900c329
|
|
| MD5 |
8a5dc085f450f5e88e124a280c1b12d6
|
|
| BLAKE2b-256 |
ef5673f23acaad3a66c7a6f0c51436a2c8fcac0eba8e41a9fa6354afe30ed743
|