Skip to main content

PyTorch-Inspired Optimum-Path Forest Classifier

Project description

OPForch: A PyTorch-Powered Optimum-Path Forest Classifier

Latest release Open issues License

Welcome to OPForch.

Note that this implementation relies purely on the standard LibOPF. Therefore, if one uses our package, please also cite the original LibOPF authors.

OPForch is a PyTorch-based implementation of the Optimum-Path Forest (OPF) classifier, migrated from the original OPFython package. By replacing per-node Python objects with dense tensors and scalar Numba loops with batched tensor operations, OPForch delivers massive speedups while maintaining zero prediction mismatches against the reference implementation.

Key Highlights

Metric Result
Accuracy Parity 0 prediction mismatches across all 4 classifiers
Predict Speedup Up to 484× faster at N=10,000
Fit Speedup Up to 19× faster at N=10,000
Distance Matrix Up to 413× faster (batched tensor vs N² scalar loop)
GPU Acceleration 12.7× additional speedup on RTX 4070 for distance computation
Device Support CPU, CUDA, and Multi-GPU via DeviceManager

Use OPForch if you need:

  • Graph-based classification without hyperparameter tuning
  • Deterministic training with competitive accuracy
  • GPU-accelerated distance computation and prediction
  • A drop-in replacement for OPFython with orders-of-magnitude speedups

OPForch is compatible with: Python 3.8+ and PyTorch 2.0+.


Package Structure

opforch/
├── core/
│   ├── heap.py          # Tensor-backed binary heap
│   ├── subgraph.py      # Dense tensor columns (13 state tensors)
│   └── opf.py           # Abstract base (torch.save/load, device)
├── math/
│   ├── distance.py      # 47 batched (N,D)×(M,D)→(N,M) distance metrics
│   ├── general.py       # Accuracy, confusion matrix, normalize, purity
│   └── random.py        # Tensor-based random generators
├── models/
│   ├── supervised.py        # MST + competition + batched predict
│   ├── knn_supervised.py    # KNN density clustering + k-selection
│   ├── semi_supervised.py   # Labeled + unlabeled propagation
│   └── unsupervised.py      # Density clustering + normalized cut
├── stream/
│   ├── loader.py        # CSV/TXT/JSON → torch.Tensor
│   ├── parser.py        # Extract features + labels
│   └── splitter.py      # Train/test split
├── subgraphs/
│   └── knn.py           # KNNSubgraph (torch.topk, vectorized PDF)
├── utils/
│   ├── constants.py     # EPSILON, FLOAT_MAX, status codes
│   ├── converter.py     # Binary OPF format converters
│   ├── device.py        # DeviceManager (CPU/GPU/multi-GPU)
│   ├── exception.py     # Custom exception hierarchy
│   └── logging.py       # Timed rotating file logger
├── report/              # Migration report, benchmarks, and plots
├── examples/            # Usage scripts for all 4 classifiers

Installation

Install from source:

git clone https://github.com/gugarosa/opforch.git
cd opforch
pip install -e .

For GPU support, install PyTorch with CUDA:

pip install torch --index-url https://download.pytorch.org/whl/cu124

Quick Start

Supervised Classification

import torch
from opforch.models import SupervisedOPF
from opforch.stream import loader, parser, splitter

# Load data
data = loader.load_txt("data/boat.txt")
X, Y = parser.parse_loader(data)
X_train, X_test, Y_train, Y_test = splitter.split(X, Y, percentage=0.5)

# Train and predict (CPU)
opf = SupervisedOPF(distance="log_squared_euclidean")
opf.fit(X_train, Y_train)
predictions = opf.predict(X_test)

# GPU — just change the device
opf_gpu = SupervisedOPF(distance="euclidean", device="cuda:0")
opf_gpu.fit(X_train.cuda(), Y_train.cuda())
predictions = opf_gpu.predict(X_test.cuda())

Available Classifiers

Classifier Description
SupervisedOPF MST-based prototype detection + cost competition
KNNSupervisedOPF k-NN density clustering with validation-driven k
SemiSupervisedOPF Extends supervised with unlabeled data propagation
UnsupervisedOPF Density-based clustering with normalized cut

All classifiers support fit(), predict(), save(), and load(), and accept a device parameter for CPU/GPU execution.


Benchmarks

Run the benchmark suite to compare performance on your hardware:

# Baseline benchmarks (47 metrics, 4 models, scaling)
python report/benchmark.py

# Extended benchmarks (up to N=10K, GPU, dimensionality)
python report/benchmark_extended.py

# Generate plots
python report/plot_benchmarks.py
python report/plot_extended.py

For the full migration report with detailed analysis, see report/REPORT.md.


Architecture

The key architectural change from OPFython is the elimination of per-node Python objects in favor of dense tensor columns:

OPFython:  subgraph.nodes[i].cost = 5.0        # Python object attribute
OPForch:   subgraph.costs[i] = 5.0             # Tensor element (GPU-ready)

Prediction is fully batched — a single tensor operation replaces the O(N×M) Python loop:

dist_matrix = distance_fn(train_features, test_features)      # (N, M)
path_costs = torch.maximum(train_costs[:, None], dist_matrix)  # (N, M)
predictions = train_labels[path_costs.argmin(dim=0)]           # (M,)

For the complete architecture documentation, see ARCHITECTURE.md.


Citation

If you use OPForch to fulfill any of your needs, please cite us:

J. P. Papa, A. X. Falcão and C. T. N. Suzuki.
Supervised Pattern Classification based on Optimum-Path Forest.
International Journal of Imaging Systems and Technology (2009).

Datasets

Looking for datasets? We have some pre-loaded into OPF file format in the data/ directory. More are available at recogna.tech.


Support

If you ever need to report a bug, talk to us, or suggest improvements, please open an issue. We will do our best to help.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opforch-2.0.0.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opforch-2.0.0-py3-none-any.whl (40.9 kB view details)

Uploaded Python 3

File details

Details for the file opforch-2.0.0.tar.gz.

File metadata

  • Download URL: opforch-2.0.0.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for opforch-2.0.0.tar.gz
Algorithm Hash digest
SHA256 ac5a2fcb019190dfd8457cecc97ceff07c11c1571b84aebc3d2b2710040e1fda
MD5 d55b629acb0637e4f900f6492a077bf5
BLAKE2b-256 691edb089c26d023edb064aede3062fa86e124bf6458a5a9fef941827a1e157c

See more details on using hashes here.

File details

Details for the file opforch-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: opforch-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 40.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for opforch-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ef0ded72a18f5e4781a0cb3aaea2db316f3a446305a2d37009f5fa7274f83df2
MD5 ff9e9403cb72928a086099343d8a9766
BLAKE2b-256 c958fb9b03841a3b02b363039a13116136d29d9e5138ddd5e3f15843e103195e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page