A tiny NEF2-owned neural network and LLM training framework with a pure-Python CPU core and CUDA driver backend.
Project description
NEF2
NEF2 is a tiny neural network and LLM training framework with a pure-Python CPU core and a NEF2-owned CUDA backend:
Tensorvalues with reverse-mode autogradModule,Parameter,Linear,Embedding,LayerNorm,DropoutSGD,AdamW, and CUDA-backedCudaSGD- character tokenization and batch creation
- byte tokenization for large web text datasets
- a compact GPT-style CPU model that runs with only the Python standard library
- CUDA tensor kernels loaded directly through NVIDIA's driver API
- multi-GPU discovery and explicit device placement for CUDA devices
The CPU core has no third-party runtime dependencies. GPU support uses Python's
standard-library ctypes with nvcuda.dll; it does not use external ML
frameworks.
Quick Start
from nef2 import Tensor
from nef2.models import GPT, GPTConfig
model = GPT(GPTConfig(vocab_size=16, block_size=8, n_embd=8, n_layer=1, n_head=2))
logits = model(Tensor([[1, 2, 3, 4]]))
print(logits.shape)
CUDA Backend
from nef2 import gpu
print(gpu.device_name())
print(gpu.list_devices())
a = gpu.tensor([1, 2, 3])
b = gpu.tensor([4, 5, 6])
print((a + b).tolist())
Choose a CUDA device:
from nef2 import gpu
with gpu.use_device(0):
x = gpu.tensor([1, 2, 3])
Use the CUDA optimizer in training:
from nef2 import CudaSGD, Linear, Tensor
x = Tensor([[1.0, 2.0]], requires_grad=True)
layer = Linear(2, 1)
loss = layer(x).sum()
loss.backward()
CudaSGD(layer.parameters(), lr=0.01).step()
Keep the GPU busy long enough to verify in nvidia-smi:
nef2-gpu-stress --seconds 60 --hold-seconds 10 --elements 50000000
Wikipedia 200M Preset
NEF2 includes a Hugging Face Wikipedia loader that uses the public dataset-server API with the Python standard library:
nef2-wikipedia-200m --preset 200m --articles 8
This creates the 200M configuration without allocating the full pure-Python parameter set. For a small end-to-end smoke train on Wikipedia text:
nef2-wikipedia-200m --preset tiny --articles 4 --steps 2
Save trained weights:
nef2-wikipedia-200m --preset tiny --articles 4 --steps 50 --save model.nef
Project Layout
nef2/tensor.py- scalar/list tensor storage and autogradnef2/nn.py- neural-network module primitivesnef2/optim.py- CPU optimizers and CUDA SGD bridgenef2/tokenizer.py- character tokenizernef2/byte_tokenizer.py- byte tokenizernef2/data.py- sequence batchingnef2/datasets/huggingface.py- Hugging Face Wikipedia loadernef2/gpu.py- NEF2 CUDA driver backendnef2/serialization.py-.nefmodel save/load helpersnef2/models/gpt.py- causal Transformer language model
Current Scope
This is an educational reference implementation, not a fast production runtime. The CUDA backend currently covers low-level float tensor kernels and optimizer updates; broader model execution can be moved onto these kernels incrementally. NVIDIA CUDA is implemented. AMD, Intel, Apple, Vulkan, OpenCL, HIP/ROCm, Metal, and Level Zero backends need separate native backend implementations; NEF2 exposes backend errors clearly instead of pretending unsupported GPUs are active.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nef2-0.1.0.tar.gz.
File metadata
- Download URL: nef2-0.1.0.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e61316fc5e53759d91abe1eebfe1094e896820a67cac70ac35516db5a28529b
|
|
| MD5 |
0c4ba81602014e2e415620fe2e6695a6
|
|
| BLAKE2b-256 |
29c5d11dc5d0b8061646b02f8d9dfe0718103aa798a2b65a53afacc4079e4ec2
|
File details
Details for the file nef2-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nef2-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fc20d1565ba4468c0800d2713e8ff256e1d6ea3d1e92e0a965abeb99737a64f
|
|
| MD5 |
d197b5e6e0ff434f5b89a6fbd9bb2778
|
|
| BLAKE2b-256 |
93b29ab7bca11528010c9d950f8c76fd8132b666b1e5a10d809938736f2c7630
|