Skip to main content

A tiny NEF2-owned neural network and LLM training framework with a pure-Python CPU core and CUDA driver backend.

Project description

NEF2

A small neural-network and LLM framework with a pure-Python CPU core and a NEF2-owned CUDA driver backend.

PyPI Python License Frameworks CUDA

pip install nef2

Overview

NEF2 is an experimental framework for learning and building neural-network systems from first principles. It includes a readable CPU autograd engine, a small PyTorch-shaped neural-network API, a compact GPT-style model, Wikipedia dataset tooling, .nef model serialization, and a CUDA backend that talks directly to NVIDIA's driver API through Python's standard library.

The project intentionally avoids external ML frameworks. The CUDA path does not use PyTorch, TensorFlow, CuPy, JAX, or the Hugging Face datasets package.

Install

pip install nef2

Browse the source:

git clone https://github.com/Hexa08/NEF2.git
cd NEF2

Quick Start

from nef2 import Tensor
from nef2.models import GPT, GPTConfig

model = GPT(GPTConfig(vocab_size=16, block_size=8, n_embd=8, n_layer=1, n_head=2))
logits = model(Tensor([[1, 2, 3, 4]]))

print(logits.shape)

Feature Matrix

Area Status Notes
CPU tensors Implemented Python-list tensor storage with scalar/list shapes
Autograd Implemented Reverse-mode graph execution
Neural layers Implemented Linear, Embedding, LayerNorm, Dropout, Sequential
Optimizers Implemented SGD, AdamW, CUDA-backed CudaSGD with GPU caching
GPT model Implemented on CPU Compact causal Transformer with KV cache and matmul attention
KV cache Implemented Caches K/V across generation steps (O(n) per token)
.nef model files Implemented Compact binary format with integrity checks
NVIDIA CUDA backend Implemented Vector kernels + matmul + layernorm + cross-entropy, Linux + Windows
GPU layernorm Implemented PTX kernel, integrated into nn.LayerNorm
GPU embedding Implemented PTX kernel, integrated into nn.Embedding
GPU cross-entropy Implemented PTX kernel with stable log-softmax
CudaTensor.data Implemented Property for CPU/GPU tensor compatibility
Vendor backend detection Implemented Auto-detects CUDA, HIP, Level Zero, Metal
AMD, Intel, Apple backends Planned Native kernels for HIP, Level Zero, Metal
Full GPU backward kernels Planned Dedicated backward kernels for matmul, layernorm
Context window extension Planned RoPE / ALiBi for >1024 tokens

Architecture

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#0f766e", "primaryTextColor": "#ffffff", "primaryBorderColor": "#0f172a", "lineColor": "#334155", "secondaryColor": "#dbeafe", "tertiaryColor": "#f8fafc", "fontFamily": "Inter, ui-sans-serif, system-ui"}}}%%
flowchart LR
    User["User code / CLI"] --> API["NEF2 public API"]
    API --> CPU["CPU Tensor + Autograd"]
    API --> NN["nn Modules"]
    API --> Data["Tokenizers + Datasets"]
    API --> Save[".nef Serialization"]

    NN --> GPT["GPT Model"]
    CPU --> GPT
    Data --> GPT
    GPT --> Train["Training Loop"]
    Save --> Files["model.nef"]

    API --> CUDA["NEF2 CUDA Backend"]
    CUDA --> Driver["NVIDIA Driver API"]
    Driver --> GPU["CUDA GPU"]

    Train --> Optim["SGD / AdamW / CudaSGD"]
    Optim --> CPU
    Optim --> CUDA

Project Mindmap

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#111827", "primaryTextColor": "#f9fafb", "lineColor": "#0f766e", "tertiaryColor": "#ecfeff"}}}%%
mindmap
  root((NEF2))
    Core
      Tensor
      Autograd
      Modules
      Optimizers
    Models
      GPT
      200M preset
      Byte tokenizer
    Data
      Character tokenizer
      Byte tokenizer
      Language model batches
    GPU
      CUDA driver API
      Multi device selection
      Vector kernels
      CudaSGD
    Artifacts
      PyPI package
      GitHub source
      model.nef

CUDA Backend

NEF2 includes a direct NVIDIA CUDA backend. It loads the CUDA driver (nvcuda.dll on Windows, libcuda.so.1 on Linux), creates a context, loads NEF2 PTX kernels, allocates device memory, launches kernels, and copies results back.

from nef2 import gpu

print(gpu.device_name())
print(gpu.list_devices())

a = gpu.tensor([1, 2, 3])
b = gpu.tensor([4, 5, 6])

print((a + b).tolist())

Choose a CUDA device:

from nef2 import gpu

with gpu.use_device(0):
    x = gpu.tensor([1, 2, 3])

GPU matrix multiplication:

from nef2 import gpu

a = gpu.tensor([[1.0, 2.0], [3.0, 4.0]])
b = gpu.tensor([[5.0, 6.0], [7.0, 8.0]])
c = a.matmul(b)
print(c.tolist())

Keep the GPU busy long enough to verify in nvidia-smi:

nef2-gpu-stress --seconds 60 --hold-seconds 10 --elements 50000000

Expected result:

device=NVIDIA GeForce RTX 3050 Ti Laptop GPU
result=[3.0, 3.0, 3.0]

Training

NEF2 does not bundle datasets. Bring your own text, tokenize it, and train:

from nef2 import GPT, GPTConfig, Tensor, AdamW, cross_entropy, save_model
from nef2.data import make_lm_batch
from nef2.byte_tokenizer import ByteTokenizer

# Load any text you want
with open("my_text.txt", "rb") as f:
    text = f.read()

tokenizer = ByteTokenizer()
tokens = tokenizer.encode(text)

config = GPTConfig(
    vocab_size=tokenizer.vocab_size,
    block_size=256,
    n_embd=384,
    n_layer=6,
    n_head=6,
)
model = GPT(config)
opt = AdamW(model.parameters(), lr=3e-4)

for step in range(1000):
    xb, yb = make_lm_batch(tokens, batch_size=32, block_size=config.block_size)
    loss = cross_entropy(model(xb), yb)
    opt.zero_grad()
    loss.backward()
    opt.step()
    if step % 100 == 0:
        print(f"step {step}: loss = {loss.item():.4f}")

save_model(model, "model.nef")

Design Principles

  • Keep the CPU core dependency-free and readable.
  • Make tensor, module, optimizer, and model APIs familiar to users of modern ML frameworks without importing those frameworks.
  • Own the GPU path inside NEF2 instead of delegating training to PyTorch or CuPy.
  • Be explicit about scope: implemented features should run; planned features should be labeled as planned.

Package Layout

nef2/
  tensor.py                 # Tensor storage and reverse-mode autograd
  nn.py                     # Module, Parameter, layers, cross entropy
  optim.py                  # SGD, AdamW, CudaSGD
  gpu.py                    # CUDA driver backend
  serialization.py          # .nef save/load helpers
  tokenizer.py              # Character tokenizer
  byte_tokenizer.py         # Byte tokenizer
  data.py                   # Language-model batching
  models/gpt.py             # Causal Transformer model
  models/presets.py         # 200M preset and parameter estimator
  cli/                      # GPU stress test command-line tool

Roadmap

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#7c3aed", "primaryTextColor": "#ffffff", "lineColor": "#64748b", "secondaryColor": "#f5f3ff"}}}%%
flowchart TB
    A["CPU GPT + CUDA vector kernels"] --> B["GPU matmul + layernorm + cross-entropy"]
    B --> C["GPU backward kernels"]
    C --> D["Full GPT CUDA training"]
    D --> E["KV cache compressor (INT4, delta, LZ)"]
    E --> F["Context window extension (RoPE / ALiBi)"]
    F --> G["Checkpointed 200M training"]
    G --> H["AMD HIP / Intel Level Zero / Apple Metal backends"]

Status

NEF2 is an alpha framework. It is suitable for experimentation, education, framework development, and small-model tests. It is not yet a fast production training stack for large LLMs.

NVIDIA CUDA is implemented for the current low-level backend. AMD, Intel, Apple, Vulkan, OpenCL, HIP/ROCm, Metal, and Level Zero require separate backend implementations. NEF2 reports unsupported backends clearly instead of pretending unsupported GPUs are active.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nef2-0.2.2.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nef2-0.2.2-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file nef2-0.2.2.tar.gz.

File metadata

  • Download URL: nef2-0.2.2.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nef2-0.2.2.tar.gz
Algorithm Hash digest
SHA256 8552fa591d6eda48f80bfed37582b7958e7054e9174575ed8f75f1bfe39024ac
MD5 06ba842113fd9a3063c4bc637b03789f
BLAKE2b-256 ac33d228deb8b5d70a7fefe23a196001ee6b60aaed4a430e23f8487cba4a8936

See more details on using hashes here.

File details

Details for the file nef2-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: nef2-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nef2-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 00bc9ddb343855e8a11ec81248676f8e2da68caaf2efd34d08d12c315561c9cd
MD5 86402dd3926ed98133dfabfd259fc47e
BLAKE2b-256 b24b481ab0085de57091065d940d62a8a9752f0b5534c25748d787fd624b3b48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page