Classical ML algorithms and a neural network with custom autograd, implemented from scratch in C++ with a Python API. No NumPy or ML library dependencies.
Project description
bare-metal-ml
A machine learning library built from mathematical foundations — classical algorithms and a fully-connected neural network with a custom autograd engine, implemented from scratch in C++ with a clean Python API. No NumPy, no PyTorch, no scikit-learn in any algorithm code.
Every Python call runs C++ under the hood via a compiled pybind11 extension, with BLAS-accelerated matrix multiplication on Apple Silicon and x86.
Table of Contents
- Installation
- Neural Network
- Autograd Engine
- Classical Algorithms
- Linear Algebra Utilities
- Project Structure
- Benchmarks
Installation
Requirements: Python 3.10+, a C++17 compiler, and pybind11.
git clone https://github.com/arora-abhinav/bare-metal-ml.git
cd bare-metal-ml
pip install -e .
The build step compiles the C++ extension automatically. Verify the installation:
import bare_metal_ml as bml
print(bml.Network) # <class 'bare_metal_ml._cpp.Network'>
All classes shown as bare_metal_ml._cpp.* are running pure C++.
Neural Network
A fully-connected feedforward network with:
- Mini-batch training with per-epoch shuffling
- He initialization (
std = sqrt(2 / fan_in)) for stable ReLU gradients - Inverted dropout
- Softmax output with cross-entropy loss
- Adam and SGD optimizers
- Topo-sort cached autograd graph for efficient backpropagation
- Weight persistence (save / load JSON)
Data Format
This library uses column-major layout. Data must be shaped (features × samples), not the conventional (samples × features).
import numpy as np
# x_train shape: (samples, features) — standard layout
# Transpose before passing to bare_metal_ml
x_train_col = x_train.T.tolist() # shape becomes (features, samples)
# Labels must be one-hot encoded, shape (classes, samples)
def one_hot(labels, n_classes=10):
result = [[0.0] * len(labels) for _ in range(n_classes)]
for i, label in enumerate(labels):
result[label][i] = 1.0
return result
y_train_oh = one_hot(y_train)
For inference, predict() and accuracy() also expect column-major input:
x_test_col = x_test.T.tolist()
Optimizers
Two optimizers are available. Pass one instance to Network at construction time.
Adam (recommended)
Adaptive moment estimation. Maintains per-parameter first and second moment estimates with bias correction.
from bare_metal_ml import Adam
optimizer = Adam(learning_rate=0.001) # default: 0.001
optimizer = Adam(0.01)
Hyperparameters β₁=0.9, β₂=0.999, ε=1e-8 are fixed at their standard values.
SGD
Vanilla stochastic gradient descent.
from bare_metal_ml import SGD
optimizer = SGD(learning_rate=0.01) # default: 0.01
Built-in Activation Functions
Three activation functions are available as FunctionType enum values.
from bare_metal_ml import FunctionType
FunctionType.RELU # max(0, x) — default, recommended for deep networks
FunctionType.SIGMOID # 1 / (1 + e^-x)
FunctionType.TANH # tanh(x)
Pass to Network via the function_type keyword argument. All hidden layers use the chosen activation; the output layer always uses softmax.
Custom Activation Functions
You can inject any element-wise activation function by subclassing ActivationFunction and implementing two methods: forward(x) for the forward pass and derivative(x) for the local derivative used during backpropagation. Both operate on a single scalar x.
from bare_metal_ml import ActivationFunction, Network, Adam
class LeakyReLU(ActivationFunction):
def __init__(self, alpha=0.01):
super().__init__()
self.alpha = alpha
def forward(self, x: float) -> float:
return x if x > 0 else self.alpha * x
def derivative(self, x: float) -> float:
return 1.0 if x > 0 else self.alpha
class Swish(ActivationFunction):
"""x * sigmoid(x)"""
def __init__(self):
super().__init__()
def forward(self, x: float) -> float:
import math
s = 1.0 / (1.0 + math.exp(-x))
return x * s
def derivative(self, x: float) -> float:
import math
s = 1.0 / (1.0 + math.exp(-x))
return s + x * s * (1.0 - s)
# Pass via the `activation` argument — overrides `function_type`
my_act = LeakyReLU(alpha=0.1)
net = Network(
layer_num = 3,
neurons_in_layers= [128, 64, 10],
initial_input = x_train_col,
optimizer = Adam(0.001),
dropout_rate = 0.2,
activation = my_act, # custom activation takes priority
)
The C++ training loop calls back into your Python forward() and derivative() methods transparently via a pybind11 virtual dispatch trampoline, so any Python-level logic (math, conditional branches) works as expected.
Building and Training
from bare_metal_ml import Network, Adam, FunctionType
adam = Adam(0.001)
net = Network(
layer_num = 3, # number of layers (including output)
neurons_in_layers = [128, 64, 10], # neurons per layer
initial_input = x_train_col, # (features × samples) list-of-lists
optimizer = adam,
dropout_rate = 0.2, # fraction of neurons to drop (0.0 = no dropout)
function_type = FunctionType.RELU,
)
net.train_loop(
epochs = 20,
train_labels = y_train_oh, # one-hot (classes × samples)
batch_size = 64,
)
dropout_rate is applied during training only. Inference automatically disables dropout.
Evaluation and Prediction
# accuracy() returns a float in [0, 1]
acc = net.accuracy(x_test_col, y_test_labels)
print(f"Test accuracy: {acc * 100:.2f}%")
# predict() returns a flat list of integer class indices
predictions = net.predict(x_test_col)
y_test_labels passed to accuracy() is a flat list of integer class indices (not one-hot).
Saving and Loading Weights
net.save_weights("weights.json") # saves W and b for every layer
net.load_weights("weights.json") # restores weights in-place
Weights are serialised as JSON arrays. The file path defaults to "weights.json" if omitted.
Recommended Configurations
Based on benchmarks against PyTorch and Keras on MNIST (48 000 train / 12 000 test):
| Task | Architecture | Optimizer | Dropout | Notes |
|---|---|---|---|---|
| Image classification (MNIST-scale) | [256, 128, n_classes] |
Adam 0.001 | 0.2 | Strong baseline |
| Tabular data, small dataset | [64, 32, n_classes] |
Adam 0.001 | 0.0–0.1 | Avoid heavy dropout on small data |
| Tabular data, large dataset | [256, 128, 64, n_classes] |
Adam 0.001 | 0.2–0.3 | He init handles depth well |
| Binary classification | [64, 32, 2] |
Adam 0.001 | 0.1 | Or use LogisticRegression for linear problems |
| Fast prototyping | [128, n_classes] |
SGD 0.01 | 0.0 | Fewer parameters, faster iteration |
General rules:
- Adam over SGD for most tasks — faster convergence, less sensitive to learning rate.
- ReLU over Sigmoid/Tanh for hidden layers — He init is matched to ReLU; vanishing gradients are less of an issue.
- Dropout 0.1–0.3 for larger networks on image data; reduce or remove for tabular data with fewer features.
- Batch size 64–256 — smaller batches generalise better but train slower.
Autograd Engine
Scalar and Matrix are first-class computation graph nodes. Every arithmetic operation creates a new node that records its children and a backward closure. Calling topo_sort() then backprop() propagates gradients through the graph.
Scalar
Operates on single floating-point values.
from bare_metal_ml import Scalar
a = Scalar(2.0)
b = Scalar(3.0)
# Forward pass — builds the computation graph
c = a * b # 6.0
d = c + Scalar(1.0) # 7.0
# Seed the root gradient and backpropagate
d.gradient = 1.0
graph = d.topo_sort()
d.backprop(graph)
print(a.gradient) # 3.0 (d(d)/d(a) = b = 3)
print(b.gradient) # 2.0 (d(d)/d(b) = a = 2)
Available operations:
| Python syntax | Method | Notes |
|---|---|---|
a + b |
__add__ |
|
a * b |
__mul__ |
|
a - b |
__sub__ |
|
a / b |
__truediv__ |
|
-a |
__neg__ |
|
a.pow_op(b) |
pow_op |
aᵇ |
a.relu() |
relu |
max(0, x) |
a.sigmoid() |
sigmoid |
1/(1+e⁻ˣ) |
a.tanh_op() |
tanh_op |
tanh(x) |
a.exp_op() |
exp_op |
eˣ |
a.log_op() |
log_op |
ln(x) |
3.0 + a |
__radd__ |
scalar on left |
3.0 * a |
__rmul__ |
scalar on left |
Attributes:
a.digit— the scalar value (read/write)a.gradient— accumulated gradient (read/write, initialised to 0.0)a.operation— string name of the op that created this node (read-only)
Matrix
Operates on 2-D matrices (list-of-lists). Gradients are matrices of the same shape.
from bare_metal_ml import Matrix
A = Matrix([[1.0, 2.0],
[3.0, 4.0]])
B = Matrix([[5.0, 6.0],
[7.0, 8.0]])
# Matrix multiplication (not element-wise)
C = A * B
# Seed and backpropagate
C.gradient = [[1.0, 1.0], [1.0, 1.0]]
graph = C.topo_sort()
C.backprop(graph)
print(A.gradient) # dL/dA = dL/dC @ B^T
print(B.gradient) # dL/dB = A^T @ dL/dC
Available operations:
| Python syntax / method | Behaviour |
|---|---|
A + B |
Element-wise addition |
A * B |
Matrix multiplication (not Hadamard) |
A - B |
Element-wise subtraction |
A / B |
Element-wise division |
-A |
Negate all elements |
A.element_wise_mult(B) |
Hadamard (element-wise) product |
A.scalar_multiply(s) |
Multiply every element by scalar s |
A.transpose_op() |
Transpose |
A.sum_cols() |
Sum across columns → (rows × 1) vector |
A.relu() |
Element-wise ReLU |
A.sigmoid() |
Element-wise sigmoid |
A.tanh_op() |
Element-wise tanh |
A.exp_op() |
Element-wise eˣ |
A.log_op() |
Element-wise ln(x) |
Attributes:
A.matrix— the 2-D list of values (read/write)A.gradient— 2-D list of gradients, same shape (read/write)A.operation— string op name (read-only)
Classical Algorithms
Gaussian Discriminant Analysis
Generative classifier. Fits a multivariate Gaussian per class and classifies by maximum likelihood.
from bare_metal_ml import GDA
gda = GDA(positive_class="M") # label of the positive class (binary classification)
gda.fit(x_train, y_train)
prediction = gda.predict_one(x_sample)
predictions = gda.predict(x_test)
acc = gda.accuracy(x_test, y_test)
x_train is a list of feature vectors; y_train is a list of string labels.
K-Nearest Neighbours and KD-Tree
from bare_metal_ml import KNN, KDTree, euclidean, manhattan, cosine
# KNN — brute-force, O(n) per query
knn = KNN(k=5, metric="euclidean") # metric: "euclidean" | "manhattan" | "cosine"
knn.fit(x_train, y_train)
label = knn.predict_one(x_sample)
predictions = knn.predict(x_test)
acc = knn.accuracy(x_test, y_test)
# KD-Tree — O(log n) average per query
kdt = KDTree()
kdt.fit(x_train, y_train)
label = kdt.predict_one(x_sample, k=5)
predictions = kdt.predict(x_test, k=5)
acc = kdt.accuracy(x_test, y_test, k=5)
# Distance functions are also available standalone
d = euclidean([1.0, 2.0], [4.0, 6.0]) # 5.0
d = manhattan([1.0, 2.0], [4.0, 6.0]) # 7.0
d = cosine([1.0, 0.0], [0.0, 1.0]) # 1.0 (maximally dissimilar)
Linear Regression
Trained via gradient descent on mean squared error.
from bare_metal_ml import LinearRegression
lr = LinearRegression()
lr.fit(x_train, y_train, learning_rate=0.01, iterations=1000)
predictions = lr.predict(x_test)
mse = lr.mse(x_test, y_test)
x_train is a list of feature vectors; y_train is a list of scalar targets.
Logistic Regression
Binary classifier trained via gradient descent on binary cross-entropy.
from bare_metal_ml import LogisticRegression
logr = LogisticRegression(positive_class="spam")
logr.fit(x_train, y_train, learning_rate=0.001, iterations=1000)
probabilities = logr.predict_proba(x_test)
predictions = logr.predict(x_test, threshold=0.5)
acc = logr.accuracy(x_test, y_test, threshold=0.5)
Naive Bayes
Three variants for different data types.
from bare_metal_ml import GaussianNaiveBayes, BernoulliNaiveBayes, MultinomialNaiveBayes
# Gaussian — continuous features (e.g., measurements)
gnb = GaussianNaiveBayes()
gnb.fit(x_train, y_train)
acc = gnb.accuracy(x_test, y_test)
# Bernoulli — binary bag-of-words features (text classification)
bnb = BernoulliNaiveBayes(vocab_size=1000)
bnb.fit(x_train, y_train) # x_train: list of raw text strings
acc = bnb.accuracy(x_test, y_test)
# Multinomial — word count features (text classification)
mnb = MultinomialNaiveBayes(vocab_size=1000)
mnb.fit(x_train, y_train)
acc = mnb.accuracy(x_test, y_test)
All three share the same interface: fit, predict_one, predict, accuracy.
Linear Algebra Utilities
All functions are C++ and available under the bare_metal_ml.linalg namespace.
from bare_metal_ml import linalg
# Core matrix operations
C = linalg.matrix_with_matrix_multiplication(A, B)
S = linalg.matrix_addition_and_sub(A, B, "add") # "add" or "sub"
S = linalg.scalar_multiply_matrix(A, 3.0)
H = linalg.element_wise_multiplication(A, B)
D = linalg.element_wise_division_two_matrices(A, B)
R = linalg.element_wise_roots(A, 2.0) # element-wise sqrt
T = linalg.transpose_matrix(A)
M = linalg.ReLU_derivative(A) # 1 where A > 0, else 0
v = linalg.sum_across_column(A) # row-wise sum → vector
# Utility functions
outer = linalg.matrix_product_from_vector_and_transpose(n, v) # outer product v @ v^T
diff = linalg.calculate_vector(v1, v2) # v1 - v2
dot = linalg.scalar_product_from_transpose_and_vector(v1, v2) # dot product
mv = linalg.matrix_product_with_matrix_and_vector(A, v, rows, cols)
# Matrix decomposition and inverse
L, U = linalg.LU_decomposition(A, n) # Doolittle LU factorisation
det = linalg.calculate_determinant(U, n) # determinant from upper triangular
A_inv = linalg.matrix_inverse(L, U, n) # inverse via forward/back substitution
A_reg = linalg.regularize(A, n, epsilon=1e-6) # add ε to diagonal for numerical stability
All inputs and outputs are Python list[list[float]] for matrices and list[float] for vectors.
Project Structure
bare-metal-ml/
├── bare_metal_ml/
│ ├── __init__.py # public API — imports everything from _cpp
│ ├── _cpp.*.so # compiled C++ extension (built on install)
│ └── cpp/
│ ├── autograd.hpp # Scalar, Matrix, TopologicalSort
│ ├── linalg.hpp # all math operations (BLAS matmul)
│ ├── neural_network.hpp
│ ├── gda.hpp
│ ├── knn.hpp
│ ├── linear_regression.hpp
│ ├── logistic_regression.hpp
│ ├── naive_bayes.hpp
│ └── bindings.cpp # pybind11 module definition
├── notebooks/
│ ├── neural_network/ # reference implementation + MNIST data
│ ├── gda/
│ ├── knn/
│ ├── linear_regression/
│ ├── logistic_regression/
│ └── naive_bayes/
├── benchmarks/
│ ├── benchmark_neural_network.py
│ ├── benchmark_classifiers.py
│ ├── benchmark_linear_regression.py
│ └── benchmark_naive_bayes.py
├── pyproject.toml
└── setup.py
The notebooks/ directory contains the original Python reference implementations. They are not used by the library but document the mathematical derivations behind each algorithm.
Benchmarks
Benchmarked on MNIST (48 000 train / 12 000 test), architecture 784 → 128 → 64 → 10, Adam lr=0.01, dropout=0.2, 10 epochs, batch size 64:
| Model | Accuracy | Time |
|---|---|---|
| bare-metal-ml | ~96% | ~43s |
| PyTorch | ~96% | ~7s |
| Keras (PyTorch backend) | ~96% | ~49s |
Accuracy is on par with PyTorch and Keras. The speed gap comes from the Python↔C++ boundary: each matrix operation in the autograd graph is a separate pybind11 dispatch. The flexibility of the autograd design (arbitrary activation functions, custom graph topologies) is the deliberate trade-off.
Author
Abhinav Arora
University of Maryland — Computer Science
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bare_metal_ml_cpp-0.1.0.tar.gz.
File metadata
- Download URL: bare_metal_ml_cpp-0.1.0.tar.gz
- Upload date:
- Size: 39.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e462224be42f07090316e1763789d6c604cf246085b52e34f9b27d2d827552a4
|
|
| MD5 |
eeeba2ef75f704b12da9b27fcc8bf70a
|
|
| BLAKE2b-256 |
fcd5f32c0ff8245152020f2f6a896f07de69e042b67cc6b71f41d0788629d6e0
|
File details
Details for the file bare_metal_ml_cpp-0.1.0-cp314-cp314-macosx_10_15_universal2.whl.
File metadata
- Download URL: bare_metal_ml_cpp-0.1.0-cp314-cp314-macosx_10_15_universal2.whl
- Upload date:
- Size: 762.5 kB
- Tags: CPython 3.14, macOS 10.15+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5357861dc6837dee5bc4603ad5f1660f556e4e76d7c599e08b48e40c2439dd2a
|
|
| MD5 |
4868ab6babd2b0f4d1996838600cdce8
|
|
| BLAKE2b-256 |
fe3a37b04976d3bc05345ee643adf2d302fc7398d027c19aee5ebaf9dc24a6ca
|