slope AD
Project description
Slope
Slope is a small automatic differentation (AD) engine, focused on machine learning (ML) Tensor semantics are similar to Pytorch, functional API is similar to JAX, tensor operators code is heavily derived from tinygrad.
Example:
import slope
def f(x):
y = x * 2.0
return y.sum()
x = slope.tensor([1.,2.,3.])
gf_x = slope.grad(f)(x)
print(f"{gf_x=}")
gf_x=<Tensor: val=
[2. 2. 2.]
shape=(3,), dtype=float32, device='cpu:0'>
Install
git clone https://github.com/radenmuaz/slope
cd slope
pip install -e .
Or you can just copy src/slope
to your projects.
Features
-
Forward-mode, reverse-mode, and higher-order AD.
-
Just-in-time compilation, with interchangeable backends, supporting CPU, CUDA and Metal:
- ONNX Runtime (ONNX graph); this is the default backend
- IREE (StableHLO MLIR)
- NumPy (Python code)
-
Training and inference, examples:
-
Small (?)
- <3000 lines of core code slope/core.py, after run with
black src --line-length 140
- <3000 lines of core code slope/core.py, after run with
-
Operators and procedures system
- 33 core operators defined in slope/operators.py
- Unary:
exp log sin sqrt invert cast stop_gradient
- Binary:
add mul sub div pow equal less greater maximum
- Reduce:
sum max
- Shape:
reshape expand permute slice pad flip cat
- Init:
full arange random_normal random_uniform
- GeneralReduce:
matmul conv gather_nd scatter_nd
- Unary:
- Composite operators system with "procedures" slope/procedures.py
- Procedures are functions containing calls to operators, exposed with
Tensor.procedure_name(*args)
syntax. - Useful for definitions like:
x.cos()
, wheredef cos(x): return (math.pi/2 - x).sin()
x.conv_transpose(w)
: wheredef conv_transpose(x, w): ...
is a very long function.
- Procedures are functions containing calls to operators, exposed with
- An operator can be directly implemented as code translation to backend, or fallback to a procedure, e.g. there is
conv
procedure in case the backend has no implementation for it.
- 33 core operators defined in slope/operators.py
-
Extensible
- Add new backend by defining implementation translations slope/backends
- NN module slope/nn.py
Quickstart
There are many examples in examples/ folder.
We start by running MNIST classifier training, examples/nn/mnist_mlp.py
python examples/nn/mnist_mlp.py
Starting training...
...
Train epoch: 2, batch: 299/300, loss: 12.51: 100%|██████████████████████████████████████████████████████████████████████| 300/300 [00:02<00:00, 139.36it/s]
Epoch 2 in 2.15 sec
Test set accuracy 0.97
By setting the SLOPE_BACKEND
flag, we change the backend to either iree
(default), onnxruntime
and numpy
.
We can also set LOG_JIT=1
to verbose print the backend output.
LOG_JIT=1 SLOPE_BACKEND=onnxruntime python examples/nn/mnist_mlp.py
...
---- train_step codegen:
<ir_version: 7, opset_import: ["" : 18, "slope":1]>
main (float[100, 784] x0, float[10, 100] x1, float[200, 28, 28] x2, float[200, 10] x3, int32[] x4, float[100, 784] x5, float[10, 100] x6, float[] x7) => (float[] y0, float[100, 784] y2, float[10, 100] y4, int32[] y5, float[100, 784] y1, float[10, 100] y3, float[] x7)
{
z0_shape = Constant <value = int64[2] { 200, 784 } >()
z0 = Reshape(x2, z0_shape)
...
y4 = Sub(x1, z164)
z165_fill_value = Constant < value = int32[1] { 1 }>()
z165_squeeze_dim = Constant <value = int64[1] {0}> ()
z165 = Squeeze (z165_fill_value, z165_squeeze_dim)
y5 = Add(x4, z165)
}
...
===============
Train epoch: 0, batch: 58/300, loss: 71.23: 20%|██████████████▎ | 59/300 [00:01<00:04, 55.45it/s]
Environment flags
put this before the command to set
# prints the jitted code
LOG_JIT=1
# set device
DEFAULT_DEVICE=cpu
DEFAULT_DEVICE=cuda
DEFAULT_DEVICE=metal
# set backend
SLOPE_BACKEND=iree # iree backend (default)
SLOPE_BACKEND=onnxruntime # onnxruntime backend
SLOPE_BACKEND=numpy # numpy backend (extremely SLOW)
Slope internals tutorial
Slope has familiar Pytorch-like syntax
Most of the things familiar in Pytorch works in Slope, probably.
import slope
x = slope.ones(2, 5)
w = slope.arange(15, dtype=slope.float32).reshape(5,3)
b = slope.tensor([1., 2., 3.], dtype=slope.float32)
y = x @ w + b
print(y)
Every operations are compiled with slope.jit
Operation calls are jitted as individual programs eagerly. Try running this on terminal:
LOG_JIT=1 python -c "import slope; print(slope.ones(3)*2)"
...
---- full_shape__lp__rp__fill_value_2_dt_0_dtype_<DType:float32>_device_<Device:'cpu:0'>_ codegen:
func.func @main () -> (tensor<f32>)
{
%y0 = "stablehlo.constant"() { value = dense<2.0> : tensor<f32> } : () -> (tensor<f32>)
"func.return"(%y0): (tensor<f32>) -> ()
}
... # plenty of outputs
....
To prevent eager jit, write code in function and use slope.jit. Then call the function
@slope.jit
def f(x):
y = x * x
return y
# Alternative way to jit
# f = slope.jit(f)
x = slope.full((1,), 2.)
y = f(x)
print(y)
To see the actual code:
jit_object = f.lower(x)
# slope Program intermediate representation
print(jit_object.program)
# backend code
print(jit_object.code)
def f(x0): # [1, f32] -> [1, f32]
y0 = slope.mul(x0, x0) # ([1, f32], [1, f32]) -> [1, f32]
return y0
func.func @main (%x0: tensor<1xf32>) -> (tensor<1xf32>)
{
%y0 = "stablehlo.multiply"(%x0, %x0) : (tensor<1xf32>,tensor<1xf32>) -> (tensor<1xf32>)
"func.return"(%y0): (tensor<1xf32>) -> ()
}
Derivatives and gradients
Slope has several AD functions, like slope.jvp
slope.vjp
and slope.grad
To do the usual backprop things:
def f(x, w):
y = x @ w
return y
def loss_fn(x, w, y):
y_hat = f(x,w)
return ((y_hat - y)**2).sum()
gloss_fn = slope.value_and_grad(loss_fn, argnums=(1,))
@slope.jit
def train_step(x, w, y, lr):
loss, gw = gloss_fn(x, w, y)
w = w - lr * gw
return loss, w
N = 50
x = slope.randn(N, 2)
y = slope.randn(N, 1)
w = slope.randn(2, 1)
lr = slope.tensor([0.001])
for i in range(10):
loss, w = train_step(x, w, y, lr)
print(i, loss.numpy())
0 102.412125
1 88.60157
2 78.322815
3 70.644066
4 64.883766
5 60.54294
6 57.25553
7 54.75257
8 52.83598
9 51.359528
Contributing
Fork this repo and hack, and maybe do a PR, too many things need to be done (see Roadmap) Idk everything is flaky and I am still experimenting and doing many API changes, maybe later I will open a new github repo.
Roadmap
- Docs
- Symbolic shape inference
- Dynamic shape jit
- Optimizer filter frozen params
- vmap vjp and jvp to compute jacobian and hessian
- iree backend currently has fixed seed random, implement threefry and JAX-like random
- make things fast
- llama (gpt) training
- whisper inference
- core tests, operators tests on all Trace types
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.