PyTorch-Inspired Optimum-Path Forest Classifier
Project description
OPForch: A PyTorch-Powered Optimum-Path Forest Classifier
Welcome to OPForch.
Note that this implementation relies purely on the standard LibOPF. Therefore, if one uses our package, please also cite the original LibOPF authors.
OPForch is a PyTorch-based implementation of the Optimum-Path Forest (OPF) classifier, migrated from the original OPFython package. By replacing per-node Python objects with dense tensors and scalar Numba loops with batched tensor operations, OPForch delivers massive speedups while maintaining zero prediction mismatches against the reference implementation.
Key Highlights
| Metric | Result |
|---|---|
| Accuracy Parity | 0 prediction mismatches across all 4 classifiers |
| Predict Speedup | Up to 484× faster at N=10,000 |
| Fit Speedup | Up to 19× faster at N=10,000 |
| Distance Matrix | Up to 413× faster (batched tensor vs N² scalar loop) |
| GPU Acceleration | 12.7× additional speedup on RTX 4070 for distance computation |
| Device Support | CPU, CUDA, and Multi-GPU via DeviceManager |
Use OPForch if you need:
- Graph-based classification without hyperparameter tuning
- Deterministic training with competitive accuracy
- GPU-accelerated distance computation and prediction
- A drop-in replacement for OPFython with orders-of-magnitude speedups
OPForch is compatible with: Python 3.8+ and PyTorch 2.0+.
Package Structure
opforch/
├── core/
│ ├── heap.py # Tensor-backed binary heap
│ ├── subgraph.py # Dense tensor columns (13 state tensors)
│ └── opf.py # Abstract base (torch.save/load, device)
├── math/
│ ├── distance.py # 47 batched (N,D)×(M,D)→(N,M) distance metrics
│ ├── general.py # Accuracy, confusion matrix, normalize, purity
│ └── random.py # Tensor-based random generators
├── models/
│ ├── supervised.py # MST + competition + batched predict
│ ├── knn_supervised.py # KNN density clustering + k-selection
│ ├── semi_supervised.py # Labeled + unlabeled propagation
│ └── unsupervised.py # Density clustering + normalized cut
├── stream/
│ ├── loader.py # CSV/TXT/JSON → torch.Tensor
│ ├── parser.py # Extract features + labels
│ └── splitter.py # Train/test split
├── subgraphs/
│ └── knn.py # KNNSubgraph (torch.topk, vectorized PDF)
├── utils/
│ ├── constants.py # EPSILON, FLOAT_MAX, status codes
│ ├── converter.py # Binary OPF format converters
│ ├── device.py # DeviceManager (CPU/GPU/multi-GPU)
│ ├── exception.py # Custom exception hierarchy
│ └── logging.py # Timed rotating file logger
├── report/ # Migration report, benchmarks, and plots
├── examples/ # Usage scripts for all 4 classifiers
Installation
Install from source:
git clone https://github.com/gugarosa/opforch.git
cd opforch
pip install -e .
For GPU support, install PyTorch with CUDA:
pip install torch --index-url https://download.pytorch.org/whl/cu124
Quick Start
Supervised Classification
import torch
from opforch.models import SupervisedOPF
from opforch.stream import loader, parser, splitter
# Load data
data = loader.load_txt("data/boat.txt")
X, Y = parser.parse_loader(data)
X_train, X_test, Y_train, Y_test = splitter.split(X, Y, percentage=0.5)
# Train and predict (CPU)
opf = SupervisedOPF(distance="log_squared_euclidean")
opf.fit(X_train, Y_train)
predictions = opf.predict(X_test)
# GPU — just change the device
opf_gpu = SupervisedOPF(distance="euclidean", device="cuda:0")
opf_gpu.fit(X_train.cuda(), Y_train.cuda())
predictions = opf_gpu.predict(X_test.cuda())
Available Classifiers
| Classifier | Description |
|---|---|
SupervisedOPF |
MST-based prototype detection + cost competition |
KNNSupervisedOPF |
k-NN density clustering with validation-driven k |
SemiSupervisedOPF |
Extends supervised with unlabeled data propagation |
UnsupervisedOPF |
Density-based clustering with normalized cut |
All classifiers support fit(), predict(), save(), and load(), and accept a device parameter for CPU/GPU execution.
Benchmarks
Run the benchmark suite to compare performance on your hardware:
# Baseline benchmarks (47 metrics, 4 models, scaling)
python report/benchmark.py
# Extended benchmarks (up to N=10K, GPU, dimensionality)
python report/benchmark_extended.py
# Generate plots
python report/plot_benchmarks.py
python report/plot_extended.py
For the full migration report with detailed analysis, see report/REPORT.md.
Architecture
The key architectural change from OPFython is the elimination of per-node Python objects in favor of dense tensor columns:
OPFython: subgraph.nodes[i].cost = 5.0 # Python object attribute
OPForch: subgraph.costs[i] = 5.0 # Tensor element (GPU-ready)
Prediction is fully batched — a single tensor operation replaces the O(N×M) Python loop:
dist_matrix = distance_fn(train_features, test_features) # (N, M)
path_costs = torch.maximum(train_costs[:, None], dist_matrix) # (N, M)
predictions = train_labels[path_costs.argmin(dim=0)] # (M,)
For the complete architecture documentation, see ARCHITECTURE.md.
Citation
If you use OPForch to fulfill any of your needs, please cite us:
J. P. Papa, A. X. Falcão and C. T. N. Suzuki.
Supervised Pattern Classification based on Optimum-Path Forest.
International Journal of Imaging Systems and Technology (2009).
Datasets
Looking for datasets? We have some pre-loaded into OPF file format in the data/ directory. More are available at recogna.tech.
Support
If you ever need to report a bug, talk to us, or suggest improvements, please open an issue. We will do our best to help.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opforch-2.0.0.tar.gz.
File metadata
- Download URL: opforch-2.0.0.tar.gz
- Upload date:
- Size: 35.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac5a2fcb019190dfd8457cecc97ceff07c11c1571b84aebc3d2b2710040e1fda
|
|
| MD5 |
d55b629acb0637e4f900f6492a077bf5
|
|
| BLAKE2b-256 |
691edb089c26d023edb064aede3062fa86e124bf6458a5a9fef941827a1e157c
|
File details
Details for the file opforch-2.0.0-py3-none-any.whl.
File metadata
- Download URL: opforch-2.0.0-py3-none-any.whl
- Upload date:
- Size: 40.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef0ded72a18f5e4781a0cb3aaea2db316f3a446305a2d37009f5fa7274f83df2
|
|
| MD5 |
ff9e9403cb72928a086099343d8a9766
|
|
| BLAKE2b-256 |
c958fb9b03841a3b02b363039a13116136d29d9e5138ddd5e3f15843e103195e
|