No project description provided

Project description

GradNerve: A Minimalist Autograd Engine for Deep Learning Education

Introduction

Automatic differentiation is a fundamental technique in deep learning, enabling the efficient computation of gradients for optimizing model parameters. GradNerve is a lightweight, NumPy-based autograd engine designed for educational purposes. It provides a clear and concise implementation of reverse-mode automatic differentiation, making it suitable for learning and experimentation.

Key Features

Tensor Class: The core Tensor class is the building block of GradNerve. It stores data, gradients, and tracks the operations performed on it.
- .data: A NumPy array storing the tensor's values.
- .grad: A NumPy array storing the accumulated gradient.
- .requires_grad: A boolean flag indicating whether gradients should be tracked for this tensor.
- ._prev: A set containing the parent Tensor objects in the computation graph.
- ._backward: A function that computes the local gradient and propagates it to the parents.
Automatic Differentiation: GradNerve implements reverse-mode automatic differentiation, a technique for efficiently computing gradients of a scalar function with respect to its inputs.
Computation Graph: The computation graph is a directed acyclic graph that represents the sequence of operations performed on the tensors. GradNerve builds this graph dynamically as operations are performed.
Operator Overloading: GradNerve overloads common operators (e.g., __add__, __mul__, __matmul__) to automatically build the computation graph.

Mathematical Foundations

GradNerve implements reverse-mode automatic differentiation (also known as backpropagation), which is a powerful technique for computing gradients of a scalar function with respect to its inputs. This is essential for training deep learning models using gradient-based optimization algorithms.

Let's consider a simple computation graph where a scalar output $L$ (the loss) is a function of several intermediate variables, which are in turn functions of the input variables. For example:

$L = f(y_1, y_2)$ $y_1 = g_1(x_1, x_2)$ $y_2 = g_2(x_2, x_3)$

where $L$ is the loss, $y_i$ are intermediate variables, and $x_i$ are the input variables (parameters of the model).

The goal is to compute the gradients of $L$ with respect to each $x_i$, i.e., $\frac{\partial L}{\partial x_i}$.

Reverse-mode automatic differentiation computes these gradients in two phases:

Forward Pass: The input values $x_i$ are fed forward through the computation graph to compute the values of the intermediate variables $y_i$ and the final loss $L$.
Backward Pass: Starting from the output $L$, the gradients are computed recursively using the chain rule. The chain rule states that if $z = f(y)$ and $y = g(x)$, then:

$$\frac{\partial z}{\partial x} = \frac{\partial z}{\partial y} \cdot \frac{\partial y}{\partial x}$$

In our example, the backward pass would proceed as follows:

Compute $\frac{\partial L}{\partial y_1}$ and $\frac{\partial L}{\partial y_2}$ (the local gradients at the output).
Compute $\frac{\partial L}{\partial x_1} = \frac{\partial L}{\partial y_1} \cdot \frac{\partial y_1}{\partial x_1}$ and $\frac{\partial L}{\partial x_2} = \frac{\partial L}{\partial y_1} \cdot \frac{\partial y_1}{\partial x_2} + \frac{\partial L}{\partial y_2} \cdot \frac{\partial y_2}{\partial x_2}$.
Compute $\frac{\partial L}{\partial x_3} = \frac{\partial L}{\partial y_2} \cdot \frac{\partial y_2}{\partial x_3}$.

Each _backward function in GradNerve implements the computation of the local gradient (e.g., $\frac{\partial y}{\partial x}$) and multiplies it by the incoming gradient (e.g., $\frac{\partial L}{\partial y}$) to compute the gradient with respect to its inputs.

Code Examples

import gradnerve.tensor as gn
import numpy as np

# Create two tensors with requires_grad=True
x = gn.Tensor([2.0], requires_grad=True)
y = gn.Tensor([3.0], requires_grad=True)

# Perform an operation
z = x * y

# Compute the gradients
z.backward()

# Print the gradients
print(x.grad)
print(y.grad)

Linear Regression Example

import gradnerve.tensor as gn
import numpy as np

# Generate some sample data
X = np.array([[1, 2], [2, 3], [3, 4]], dtype=np.float64)
y = np.array([6, 8, 10], dtype=np.float64)

# Define the model
class LinearRegression:
    def __init__(self, n_features):
        self.weights = gn.Tensor(np.zeros((n_features, 1), dtype=np.float64), requires_grad=True)
        self.bias = gn.Tensor(np.zeros(1, dtype=np.float64), requires_grad=True)

    def forward(self, X):
        return X @ self.weights + self.bias

# Initialize the model
model = LinearRegression(n_features=2)

# Define the loss function
def mse_loss(y_pred, y_true):
    return ((y_pred - y_true)**2).mean()

# Train the model
learning_rate = 0.01
num_epochs = 100

for epoch in range(num_epochs):
    # Forward pass
    X_tensor = gn.Tensor(X, requires_grad=False)
    y_pred = model.forward(X_tensor)
    y_true = gn.Tensor(y, requires_grad=False)

    # Calculate the loss
    loss = mse_loss(y_pred, y_true)

    # Backward pass
    loss.backward()

    # Update the parameters
    model.weights.data -= learning_rate * model.weights.grad
    model.bias.data -= learning_rate * model.bias.grad

    # Zero the gradients
    model.weights.grad = np.zeros_like(model.weights.data)
    model.bias.grad = np.zeros_like(model.bias.data)

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.data:.4f}')

# Print the learned parameters
print(f'Learned weights: {model.weights.data}')
print(f'Learned bias: {model.bias.data}')

Design Choices

GradNerve prioritizes simplicity and clarity over performance and features. This design choice was made to create an engine that is easy to understand, modify, and extend for educational purposes. While this approach leads to a more accessible codebase, it also results in certain trade-offs:

Performance: GradNerve's performance is not optimized for large-scale computations. Operations are performed using NumPy, which is efficient but not as highly optimized as lower-level libraries or hardware-specific implementations.
Features: GradNerve implements a limited set of features compared to more comprehensive autograd engines like PyTorch or TensorFlow. This allows for a smaller and more focused codebase, but it also means that some advanced techniques may not be directly supported.
Memory Usage: The computation graph in GradNerve is stored explicitly, which can lead to higher memory usage compared to techniques like tape-based autograd.

These trade-offs were carefully considered to create an engine that is well-suited for its primary goal: to provide a clear and concise implementation of automatic differentiation for educational purposes.

Limitations

GradNerve is a work in progress, and the current implementation has several limitations:

Limited Operator Support: Only a basic set of operators is currently supported. More advanced operations, such as convolutions and recurrent layers, are not yet implemented.
No GPU Support: GradNerve relies on NumPy, which primarily uses the CPU. GPU acceleration is not currently supported.
Lack of Optimization: The implementation is not optimized for performance. Operations are performed using NumPy, which is not as efficient as lower-level libraries or hardware-specific implementations.
No Automatic Memory Management: The user is responsible for managing memory and avoiding memory leaks.
Limited Testing: The test suite is not yet comprehensive, and there may be undiscovered bugs.

These limitations are known and may be addressed in future versions of GradNerve.

Comparison to Other Autograd Engines

GradNerve is a minimalist autograd engine, while libraries like PyTorch and TensorFlow offer a wide range of features and optimizations. GradNerve is intended for educational purposes, while PyTorch and TensorFlow are designed for production use.

Installation Instructions

pip install numpy

Usage Instructions

import gradnerve.tensor as gn
import numpy as np

# Create a tensor
x = gn.Tensor([1.0, 2.0, 3.0])

# Perform an operation
y = x + 1

# Print the result
print(y.data)

Contribution Guidelines

Contributions to GradNerve are welcome! Please submit bug reports, feature requests, and pull requests through the GitHub repository.

License

MIT License

References

Automatic Differentiation in Machine Learning: a Survey

Project details

Release history Release notifications | RSS feed

This version

0.2

May 20, 2025

0.1

May 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradnerve-0.2.tar.gz (10.3 kB view details)

Uploaded May 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gradnerve-0.2-py3-none-any.whl (11.9 kB view details)

Uploaded May 20, 2025 Python 3

File details

Details for the file gradnerve-0.2.tar.gz.

File metadata

Download URL: gradnerve-0.2.tar.gz
Upload date: May 20, 2025
Size: 10.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for gradnerve-0.2.tar.gz
Algorithm	Hash digest
SHA256	`02683b666f1bd3c87efdff5779c50675ac8a2e0f21503d52b8279406c2f14c72`
MD5	`5b5a7f59bff9d4d6e9c6fce41f21ea56`
BLAKE2b-256	`f45936cac5d44498feafa6baac3e826b1beb4edddf7034edb2649f687b1ec5c1`

See more details on using hashes here.

File details

Details for the file gradnerve-0.2-py3-none-any.whl.

File metadata

Download URL: gradnerve-0.2-py3-none-any.whl
Upload date: May 20, 2025
Size: 11.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for gradnerve-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f802e57e5b4774c0996855701689fe13212032522cf0abb559c32684aeaca22`
MD5	`c929d7ccb8b890ecd6601e46d3b82e3f`
BLAKE2b-256	`6c9248c8f93262555d42e683dfc414e73b757b521286a0bb00506d1445aad504`

See more details on using hashes here.

gradnerve 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

GradNerve: A Minimalist Autograd Engine for Deep Learning Education

Introduction

Key Features

Mathematical Foundations

Code Examples

Linear Regression Example

Design Choices

Limitations

Comparison to Other Autograd Engines

Installation Instructions

Usage Instructions

Contribution Guidelines

License

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes