Skip to main content

Tensor computation library with automatic differentiation

Project description

Polygrad Python

tinygrad-compatible Tensor API for Python. Thin ctypes wrapper around the C core — each method is one FFI call.

Project home and docs: https://polygrad.org Source code and issue tracker: https://github.com/polygrad/polygrad

Installation

pip install polygrad

Build requirements: C compiler (gcc or clang) and Python development headers (python3-dev).

Runtime requirement: clang must be on PATH. polygrad compiles compute kernels at runtime via clang.

Install requirements: Python >= 3.9, numpy. Linux only (uses POSIX fork/dlopen).

For development:

# Editable install (compiles C sources, auto-syncs from repo)
pip install -e py/

# Or build the shared library manually and point to it
make
export POLYGRAD_LIB=/path/to/build/libpolygrad.so

Quick Start

from polygrad import Tensor

# Create tensors
a = Tensor.rand(3, 4)
b = Tensor.rand(4, 5)

# Matrix multiply + softmax
c = (a @ b).softmax(-1)
print(c.numpy())

# Autograd
x = Tensor([1.0, 2.0, 3.0])
x.requires_grad = True
loss = (x * x).sum()
loss.backward()
print(x.grad.numpy())  # [2.0, 4.0, 6.0]

Training Example

from polygrad import Tensor
from polygrad.nn import Linear, SGD, get_parameters

Tensor.manual_seed(42)
model = Linear(2, 1)
opt = SGD(get_parameters(model), lr=0.01)

for i in range(100):
    opt.zero_grad()
    x = Tensor([[1.0, 2.0], [3.0, 4.0]])
    target = Tensor([[5.0], [11.0]])
    loss = (model(x) - target).square().mean()
    loss.backward()
    opt.step()

print(f"loss: {loss.item():.4f}")

Tensor API

Construction

Method Description
Tensor(data) From list, numpy array, or scalar
Tensor.zeros(*shape) Tensor of zeros
Tensor.ones(*shape) Tensor of ones
Tensor.full(shape, val) Tensor filled with value
Tensor.rand(*shape) Uniform random [0, 1)
Tensor.randn(*shape) Standard normal
Tensor.randint(low, high, shape) Random integers [low, high)
Tensor.arange(stop, start=0, step=1) Arithmetic progression
Tensor.linspace(start, stop, steps) Evenly spaced values
Tensor.eye(n) Identity matrix
Tensor.empty(*shape) Uninitialized tensor
Tensor.manual_seed(seed) Set random seed

Properties

Property Type Description
shape tuple Dimension sizes
ndim int Number of dimensions
dtype str Always 'float32'
device str Always 'CPU'
T Tensor Transpose of last two dims
requires_grad bool Settable; enables autograd
grad Tensor/None Gradient after .backward()

Realization & Conversion

Method Returns Description
realize() Tensor Execute lazy graph, return self
numpy() ndarray Realize and return numpy array
item() float Scalar value
tolist() list Nested Python list
numel() int Total elements
size(dim=None) tuple/int Shape or dimension size
detach() Tensor Copy without graph
clone() Tensor Copy preserving requires_grad

Arithmetic

a + b, a - b, a * b, a / b, -a, a ** b

All support broadcasting and scalar operands.

Comparisons

a < b, a == b, a != b, a > b, a >= b, a <= b

Returns float tensor (1.0 = true, 0.0 = false).

Element-wise Math

Method Description
exp() e^x
log() ln(x)
sqrt() Square root
square() x^2
abs() Absolute value
sign() Sign (-1, 0, +1)
reciprocal() 1/x
rsqrt() 1/sqrt(x)
sin(), cos(), tan() Trigonometric
ceil(), floor(), round(), trunc() Rounding
isnan(), isinf() NaN/Inf detection
exp2(), log2() Base-2 functions
where(x, y) Conditional: self ? x : y
maximum(other) Element-wise max
minimum(other) Element-wise min
clamp(min_=None, max_=None) Clamp to range

Activations

Method Description
relu() max(0, x)
relu6() clamp(relu(x), 0, 6)
leaky_relu(neg_slope=0.01) Leaky ReLU
sigmoid() 1 / (1 + e^-x)
tanh() Hyperbolic tangent
gelu() Gaussian Error Linear Unit
quick_gelu() Fast GELU approximation
silu() / swish() x * sigmoid(x)
elu(alpha=1.0) Exponential Linear Unit
softplus(beta=1.0) log(1 + e^(beta*x)) / beta
mish() x * tanh(softplus(x))
hardtanh(min_val=-1, max_val=1) Clamped linear
hardswish() Hard swish
hardsigmoid() Hard sigmoid

Reductions

Method Description
sum(axis=None, keepdim=False) Sum along axes
max(axis=None, keepdim=False) Maximum along axes
min(axis=None, keepdim=False) Minimum along axes
mean(axis=None, keepdim=False) Mean along axes
var(axis=None, keepdim=False, correction=1) Variance
std(axis=None, keepdim=False, correction=1) Standard deviation

Movement / Shape

Method Description
reshape(*shape) / view(*shape) Reshape (supports -1)
permute(*order) Permute dimensions
transpose(dim0=-2, dim1=-1) Swap two dimensions
expand(*shape) Broadcast to shape
squeeze(dim=None) Remove size-1 dims
unsqueeze(dim) Add size-1 dim
flatten(start_dim=0, end_dim=-1) Flatten dim range
unflatten(dim, sizes) Split dim into multiple
shrink(arg) Slice: [(start, end), ...]
pad(arg) Pad: [(before, after), ...]
flip(axis) Reverse along axes
repeat(*repeats) Tile tensor

Linear Algebra

Method Description
matmul(other) / dot(other) / @ Matrix multiplication
linear(weight, bias=None) x @ weight.T + bias

Normalization & Loss

Method Description
softmax(axis=-1) Softmax normalization
log_softmax(axis=-1) Log-softmax
layernorm(axis=-1, eps=1e-5) Layer normalization
cross_entropy(target, axis=-1) Cross-entropy loss
binary_crossentropy(target) Binary cross-entropy

Advanced Operations

Method Description
Tensor.einsum(formula, *operands) Einstein summation
rearrange(formula, **kwargs) einops-style rearrange
Tensor.cat(*tensors, dim=0) Concatenate along dim
Tensor.stack(*tensors, dim=0) Stack along new dim
split(sizes, dim=0) Split into chunks
chunk(n, dim=0) Split into n chunks
__getitem__ Indexing: int, slice, None, Ellipsis

Autograd

x = Tensor([1.0, 2.0])
x.requires_grad = True
loss = (x * x).sum()
loss.backward()
print(x.grad.numpy())  # [2.0, 4.0]

Call backward() on a scalar loss before calling item() or numpy() on the loss.

nn Module

Layers

from polygrad.nn import Linear, LayerNorm, RMSNorm, Embedding, Dropout
Class Signature Description
Linear(in_f, out_f, bias=True) y = x @ W.T + b Fully connected layer
LayerNorm(shape, eps=1e-5) (x - mean) / sqrt(var + eps) * w + b Layer normalization
RMSNorm(dim, eps=1e-5) x / rms(x) * w Root mean square normalization
Embedding(vocab, dim) Lookup table Token embedding
Dropout(p=0.5) Random zeroing Training-only (controlled by Tensor.training)
GroupNorm(groups, channels) Group normalization Per-group normalization

Optimizers

from polygrad.nn import SGD, Adam, AdamW, get_parameters
Class Signature
SGD(params, lr=0.01, momentum=0.0, weight_decay=0.0)
Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.0)
AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.01)

All optimizers have step() and zero_grad() methods.

State Dict

from polygrad.nn import get_parameters, get_state_dict, load_state_dict

params = get_parameters(model)       # List of Tensor
sd = get_state_dict(model)           # {'weight': Tensor, 'bias': Tensor, ...}
load_state_dict(model2, sd)          # Load params into another model

Compiled Training Steps

Compile a training step into a reusable C program. The first call traces the computation graph; subsequent calls execute with zero scheduling overhead.

from polygrad import Tensor
from polygrad.nn import Linear, SGD, get_parameters, compile_step

Tensor.manual_seed(42)
model = Linear(4, 1)
opt = SGD(get_parameters(model), lr=0.01)

# Sample inputs (shapes must match at runtime)
x = Tensor.rand(8, 4)
y = Tensor.rand(8, 1)

def train_step(model, opt, x, y):
    loss = (model(x) - y).square().mean()
    loss.backward()
    opt.step()
    opt.zero_grad()
    return loss

# Compile: traces forward + backward + optimizer into one PolyStep
step = compile_step(train_step, model, opt, x, y)

# Run: executes compiled kernels with current buffer data
for i in range(100):
    x._data[:] = ...  # update input data in-place
    y._data[:] = ...
    step.run()
    print(f"step {i}: loss = {step.loss_value():.4f}")

compile_step returns a CompiledTrainingStep with:

  • run() -- execute all compiled kernels (forward + backward + optimizer)
  • loss_value() -- read the loss scalar from the output buffer
  • n_kernels -- number of compiled kernels
  • n_intermediates -- number of pre-allocated intermediate buffers

HuggingFace Model Loading

Load pre-trained models directly from HuggingFace format (config.json + safetensors).

from polygrad.hf import load_hf, download_hf, generate
import numpy as np

# Download a model from HuggingFace Hub
model_path = download_hf('gpt2')

# Load into a PolyInstance
inst = load_hf(model_path, max_batch=1, max_seq_len=128)

# Run forward pass
outputs = inst.forward(
    x=np.array([[50256, 464, 3616, 286, 1204, 318]], dtype=np.float32),
    positions=np.arange(6, dtype=np.float32).reshape(1, -1),
    arange=np.arange(128, dtype=np.float32)
)
logits = outputs['output']  # (1, max_seq_len, vocab_size)

# Autoregressive generation
tokens = np.array([[50256, 464, 3616, 286, 1204, 318]], dtype=np.float32)
result = generate(inst, tokens, max_new_tokens=20, temperature=0.8, top_k=40)

HF API

Function Description
load_hf(path, max_batch=1, max_seq_len=0) Load model from local directory
load_hf_bytes(config, weights, ...) Load from raw bytes (no filesystem)
download_hf(repo_id, cache_dir=None) Download from HuggingFace Hub
generate(inst, tokens, max_new_tokens, ...) Autoregressive text generation

Supported model types: GPT-2. Weight formats: F32, F16, BF16 safetensors (single or sharded).

How It Works

  1. Lazy evaluation: Operations build a UOp graph in the C core. No computation happens until realize(), numpy(), item(), or backward().
  2. One FFI call per op: Each Tensor method calls one C function via ctypes. The C core handles all op composition (e.g., gelu = 0.5*x*(1+tanh(sqrt(2/pi)*(x+0.044715*x^3)))).
  3. Realize boundaries: Some ops (softmax, layernorm, var) insert implicit .realize() calls to create kernel boundaries for the scheduler.
  4. Autograd: backward() calls C's poly_grad() for each parameter, then realizes the gradient tensors.

Limitations

  • float32 only
  • CPU only (GPU backends planned)
  • Conv2d and BatchNorm are stubs (forward raises NotImplementedError)

Tests

python -m pytest py/tests/ -v   # 130 tests (tensor + nn + compiled step + GPT-2 + HF loading + instance)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polygrad-0.2.0.tar.gz (283.4 kB view details)

Uploaded Source

File details

Details for the file polygrad-0.2.0.tar.gz.

File metadata

  • Download URL: polygrad-0.2.0.tar.gz
  • Upload date:
  • Size: 283.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.16

File hashes

Hashes for polygrad-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1e45ce7e39d287bf5553fd04242d22f9ae476fee92db29368404c5df14ed8b48
MD5 8fe8d1ed646d6e5282c054ece9ef6eab
BLAKE2b-256 9f072eb3f82d223b5d907e8868462ac3c80cec5e55e639540d9640dcce0ba830

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page