Modern neural networks in pure NumPy - Transformers, ResNet, and more
Project description
ForgeNN
Modern neural network framework built from scratch with NumPy
I got tired of the bloated ML frameworks that hide everything behind abstractions, so I built this. It's a fully-functional deep learning library that implements modern architectures (Transformers, ResNet, attention mechanisms) using just NumPy.
Why? Because sometimes you need to actually understand what's happening under the hood. And because I can.
What's in here?
The good stuff
- Transformer encoder - yeah, the attention mechanism everyone's talking about
- ResNet blocks - because deep networks are cool
- Modern activations - GELU (GPT uses this), Swish (EfficientNet), Mish, and the classics
- Smart initialization - Xavier, He, LeCun, Orthogonal (actually matters)
Features I actually use
- Multi-head self-attention
- Layer normalization
- Dropout (because overfitting is real)
- Early stopping (saves time)
- Adam optimizer (because it just works)
- Model save/load (obviously)
What makes this different
No TensorFlow. No PyTorch. Just NumPy and math.
You can actually read the code and understand what's happening. Try doing that with PyTorch's C++ backend.
Setup
Install from PyPI (recommended)
pip install forgenn
Or install from source
git clone https://github.com/Cobkgukgg/forgenn.git
cd forgenn
pip install -e .
That's it. Seriously, just NumPy.
numpy>=1.19.0
Quick example
Build a network in like 10 lines:
from forgenn import NeuralNetwork, Dense, TrainingConfig
import numpy as np
# some random data
X = np.random.randn(1000, 10)
y = np.random.randint(0, 2, (1000, 1))
# build it
model = NeuralNetwork("MyFirstModel")
model.add(Dense(10, 64, activation="relu"))
model.add(Dense(64, 32, activation="gelu")) # gelu because why not
model.add(Dense(32, 1, activation="sigmoid"))
# train it
model.compile(loss="binary_crossentropy", optimizer="adam")
model.fit(X, y, TrainingConfig(epochs=100, batch_size=32))
# use it
predictions = model.predict(X[:5])
Or use pre-built stuff
I already made some common architectures:
from forgenn import Architectures
# ResNet for when you need to go deep
model = Architectures.resnet(
input_dim=784,
num_blocks=3,
hidden_dim=128,
output_dim=10
)
# Transformer because transformers are everywhere now
model = Architectures.transformer_encoder(
input_dim=512,
num_heads=8,
ff_dim=2048,
num_layers=6
)
Config stuff
You can tweak things:
from forgenn import TrainingConfig
config = TrainingConfig(
learning_rate=0.001, # standard
batch_size=64, # bigger = faster but needs more RAM
epochs=200, # or until early stopping kicks in
dropout_rate=0.3, # helps with overfitting
early_stopping=True, # stop when val loss stops improving
patience=15, # how long to wait
validation_split=0.2 # use 20% for validation
)
history = model.fit(X_train, y_train, config)
How it works
Layers you can use
Dense(input_size, output_size,
activation="relu",
dropout_rate=0.0)
MultiHeadAttention(embed_dim, num_heads, dropout=0.1)
LayerNormalization(normalized_shape)
Conv2D(in_channels, out_channels,
kernel_size=3, stride=1, padding=0)
ResidualBlock(dim, activation="relu")
Activations
| What | When to use |
|---|---|
relu |
Default choice, works most of the time |
gelu |
Transformers (GPT, BERT use this) |
swish |
Good for mobile/efficient networks |
mish |
Newer, slightly better than ReLU |
leaky_relu |
When you get dead neurons |
Loss functions
mse- regressionmae- regression (robust to outliers)binary_crossentropy- binary classificationcategorical_crossentropy- multi-classhuber- regression with outliers
Model Methods
# Add layer
model.add(layer)
# Compile
model.compile(loss="mse", optimizer="adam")
# Train
model.fit(X_train, y_train, config=config)
# Predict
predictions = model.predict(X_test)
# Evaluate
results = model.evaluate(X_test, y_test)
# Save/Load
model.save("model.pkl")
model.load("model.pkl")
# Summary
model.summary()
Examples
MNIST-style classification
import numpy as np
from forgenn import Architectures, TrainingConfig
# flatten those images
X_train = train_images.reshape(-1, 784) / 255.0
y_train = np.eye(10)[train_labels]
# build something that works
model = Architectures.mlp(
input_dim=784,
hidden_dims=[256, 128],
output_dim=10,
activation="gelu"
)
model.compile(loss="categorical_crossentropy", optimizer="adam")
config = TrainingConfig(
learning_rate=0.001,
batch_size=128,
epochs=50,
early_stopping=True
)
model.fit(X_train, y_train, config)
# check how we did
results = model.evaluate(X_test, y_test)
print(f"Accuracy: {results['accuracy']:.4f}")
Custom architecture
Mix and match whatever you want:
from forgenn import NeuralNetwork, Dense, ResidualBlock, LayerNormalization
model = NeuralNetwork("MyCustomNet")
model.add(Dense(100, 256, activation="gelu", dropout_rate=0.3))
model.add(LayerNormalization(256))
# throw in some residual blocks
for _ in range(3):
model.add(ResidualBlock(256, activation="mish"))
model.add(Dense(256, 10, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam")
model.summary()
Some notes
Why GELU?
Used in GPT and BERT. Smoother than ReLU, works better for NLP stuff. The math is kinda cool:
GELU(x) = 0.5 * x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x³)))
Residual connections
The thing that made deep networks actually work:
output = F(x) + x
Gradients can flow back easier. Without this, training deep networks is pain.
Performance
Tested on my laptop (i7, 16GB RAM, no GPU):
| Dataset | Model | Accuracy | Time |
|---|---|---|---|
| MNIST | MLP | ~98% | 2 min |
| MNIST | ResNet | ~99% | 4 min |
| CIFAR-10 | ResNet | ~75% | 15 min |
Not bad for pure Python/NumPy.
Todo
Things I might add:
- Batch normalization
- LSTM/GRU layers
- Better conv layers
- GPU support (CuPy?)
- Model visualization
- Data loaders
- More optimizers
Pull requests welcome.
Contributing
Found a bug? Want to add something? PRs are open.
Just keep it clean and add tests.
License
MIT - do whatever you want with it
Made this because I was bored and wanted to actually understand how transformers work. Turned out pretty decent.
If you use this for something cool, let me know!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forgenn_ml-1.0.0.tar.gz.
File metadata
- Download URL: forgenn_ml-1.0.0.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b679915e41a62dfdc1fb86640d9b4dd0a7079a869ebc0be26f41a07efb5d3511
|
|
| MD5 |
18883c1f388070fcc40e199190efc15b
|
|
| BLAKE2b-256 |
e004b82120ca2e1b935856d2749643d07b036a1577cf63004589e796efc85431
|
File details
Details for the file forgenn_ml-1.0.0-py3-none-any.whl.
File metadata
- Download URL: forgenn_ml-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d7ea9155ea1865ba2fa18ce4ce6a45b11a4e76c00d296cb57e8249698648aa2
|
|
| MD5 |
ced52f2fe7c82a055a6c757097155010
|
|
| BLAKE2b-256 |
33e44c396b98e6b802fe9b4040cffe6ada15c837edc83b8f4c237ad0ee13416e
|