Skip to main content

High-performance machine learning library powered by Rust

Project description

Ferrolearn - High-performance machine learning library

Ferrolearn brings Rust's performance to Python's machine learning ecosystem. By implementing compute-intensive algorithms in Rust, we achieve significant speedups while maintaining the familiar scikit-learn API.

Key Features

  • ๐Ÿš€ 2-10x faster than pure Python implementations
  • ๐Ÿ”ง Scikit-learn compatible API - drop-in replacement
  • ๐Ÿฆ€ Rust-powered - memory safe and blazingly fast
  • ๐Ÿ“Š Zero-copy operations - efficient NumPy integration
  • โšก Automatic parallelization - scales with your CPU cores

Installation

Prerequisites

  • Python 3.8+
  • Rust 1.70+
  • pip

Quick Start

from ferrolearn import KMeans
import numpy as np

# Generate sample data
X = np.random.rand(10000, 50)

# Create and fit model - same API as scikit-learn
kmeans = KMeans(n_clusters=5, random_state=42)
kmeans.fit(X)

# Get predictions
labels = kmeans.predict(X)
print(f"Cluster centers shape: {kmeans.cluster_centers_.shape}")
print(f"Iterations: {kmeans.n_iter_}")

API Reference

KMeans

class KMeans(n_clusters=8, max_iters=300, tol=1e-4, random_state=None)

Parameters:

  • n_clusters: Number of clusters (default: 8)
  • max_iters: Maximum iterations (default: 300)
  • tol: Convergence tolerance (default: 1e-4)
  • random_state: Random seed for reproducibility

Methods:

  • fit(X): Fit the model
  • predict(X): Predict cluster labels
  • fit_predict(X): Fit and predict in one call

Attributes:

  • cluster_centers_: Cluster centroids
  • n_iter_: Number of iterations run
  • inertia_: Sum of squared distances to nearest cluster

Architecture

ferrolearn leverages Rust's strengths where they matter most:

Python (API Layer)          Rust (Compute Layer)
    โ”‚                              โ”‚
    โ”œโ”€ KMeans.fit() โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚ Parallel distance computation
    โ”‚                              โ”‚ SIMD-ready operations
    โ”œโ”€ NumPy arrays โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚ Zero-copy array views
    โ”‚                              โ”‚ Cache-efficient algorithms
    โ””โ”€ Results โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Development

Setup Development Environment

# Clone and setup
git clone https://github.com/Rafa-Gu98/ferrolearn.git
cd ferrolearn

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in development mode
make dev-install

Running Tests

# All tests
make test

# Only Rust tests
cargo test

# Only Python tests
pytest tests/

Project Structure

ferrolearn/
โ”œโ”€โ”€ src/                # Rust source code
โ”‚   โ”œโ”€โ”€ lib.rs          # PyO3 bindings
โ”‚   โ””โ”€โ”€ kmeans.rs       # K-Means implementation
โ”œโ”€โ”€ python/             # Python package
โ”œโ”€โ”€ tests/              # Test suite
โ”œโ”€โ”€ Cargo.toml          # Rust dependencies
โ””โ”€โ”€ pyproject.toml      # Python packaging

Roadmap

Current (v0.1.0)

  • โœ… K-Means clustering
  • โœ… Scikit-learn compatible API
  • โœ… Comprehensive benchmarks

Upcoming

  • DBSCAN clustering
  • Mini-batch K-Means
  • Random Forest
  • Gradient Boosting

Future

  • GPU acceleration
  • Distributed computing
  • More algorithms based on user feedback

Contributing

We welcome contributions! ferrolearn is most impactful for:

  • Algorithms with many iterations
  • Embarrassingly parallel computations
  • Memory-intensive operations

Performance Notes

When ferrolearn shines:

  • Medium to large datasets (>10k samples)
  • Moderate dimensionality (20-100 features)
  • Multiple iterations or clusters

Current limitations:

  • Small datasets may not see significant speedup due to overhead
  • Not all algorithms benefit equally from Rust implementation

License

MIT License - see LICENSE file for details.

Author

Rafa_PyRs.dev

Acknowledgments


ferrolearn: Where Python meets Rust for machine learning performance

Made with ๐Ÿ and ๐Ÿฆ€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ferrolearn-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ferrolearn-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (276.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file ferrolearn-0.1.0.tar.gz.

File metadata

  • Download URL: ferrolearn-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.2

File hashes

Hashes for ferrolearn-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7e47cce249b81272b9d42a7c8fa24d09ddc56cbe865772dc1a82e223fa80bb5d
MD5 33c0e3c00cb937bc969468e0d6adcd74
BLAKE2b-256 2325a9daa7d8413fabc40070c2b9f6b8e010494babe636997cb2bbb9d7678d06

See more details on using hashes here.

File details

Details for the file ferrolearn-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ferrolearn-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4457ee0e1240738757da8ec8659e971f24158b6e6c1525637260eb40256cedc9
MD5 e2e7fefa19f75ab7d0eb393e1e5dcdf5
BLAKE2b-256 1d00bc266f068556ac3866cc6764f1dd817b491312b5516834148552c632c3ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page