A tiny NEF2-owned neural network and LLM training framework with a pure-Python CPU core and CUDA driver backend.

These details have not been verified by PyPI

Project links

Project description

NEF2

A small neural-network and LLM framework with a pure-Python CPU core and a NEF2-owned CUDA driver backend.

pip install nef2

Overview

NEF2 is an experimental framework for learning and building neural-network systems from first principles. It includes a readable CPU autograd engine, a small PyTorch-shaped neural-network API, a compact GPT-style model, Wikipedia dataset tooling, .nef model serialization, and a CUDA backend that talks directly to NVIDIA's driver API through Python's standard library.

The project intentionally avoids external ML frameworks. The CUDA path does not use PyTorch, TensorFlow, CuPy, JAX, or the Hugging Face datasets package.

Install

pip install nef2

Browse the source:

git clone https://github.com/Hexa08/NEF2.git
cd NEF2

Quick Start

from nef2 import Tensor
from nef2.models import GPT, GPTConfig

model = GPT(GPTConfig(vocab_size=16, block_size=8, n_embd=8, n_layer=1, n_head=2))
logits = model(Tensor([[1, 2, 3, 4]]))

print(logits.shape)

Feature Matrix

Area	Status	Notes
CPU tensors	Implemented	Python-list tensor storage with scalar/list shapes
Autograd	Implemented	Reverse-mode graph execution
Neural layers	Implemented	`Linear`, `Embedding`, `LayerNorm`, `Dropout`, `Sequential`
Optimizers	Implemented	`SGD`, `AdamW`, CUDA-backed `CudaSGD` with GPU caching
GPT model	Implemented on CPU	Compact causal Transformer with KV cache and matmul attention
KV cache	Implemented	Caches K/V across generation steps (O(n) per token)
`.nef` model files	Implemented	Compact binary format with integrity checks
NVIDIA CUDA backend	Implemented	Vector kernels + matmul + layernorm + cross-entropy, Linux + Windows
GPU layernorm	Implemented	PTX kernel, integrated into `nn.LayerNorm`
GPU embedding	Implemented	PTX kernel, integrated into `nn.Embedding`
GPU cross-entropy	Implemented	PTX kernel with stable log-softmax
CudaTensor.data	Implemented	Property for CPU/GPU tensor compatibility
Vendor backend detection	Implemented	Auto-detects CUDA, HIP, Level Zero, Metal
AMD, Intel, Apple backends	Planned	Native kernels for HIP, Level Zero, Metal
Full GPU backward kernels	Planned	Dedicated backward kernels for matmul, layernorm
Context window extension	Planned	RoPE / ALiBi for >1024 tokens

Architecture

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#0f766e", "primaryTextColor": "#ffffff", "primaryBorderColor": "#0f172a", "lineColor": "#334155", "secondaryColor": "#dbeafe", "tertiaryColor": "#f8fafc", "fontFamily": "Inter, ui-sans-serif, system-ui"}}}%%
flowchart LR
    User["User code / CLI"] --> API["NEF2 public API"]
    API --> CPU["CPU Tensor + Autograd"]
    API --> NN["nn Modules"]
    API --> Data["Tokenizers + Datasets"]
    API --> Save[".nef Serialization"]

    NN --> GPT["GPT Model"]
    CPU --> GPT
    Data --> GPT
    GPT --> Train["Training Loop"]
    Save --> Files["model.nef"]

    API --> CUDA["NEF2 CUDA Backend"]
    CUDA --> Driver["NVIDIA Driver API"]
    Driver --> GPU["CUDA GPU"]

    Train --> Optim["SGD / AdamW / CudaSGD"]
    Optim --> CPU
    Optim --> CUDA

Project Mindmap

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#111827", "primaryTextColor": "#f9fafb", "lineColor": "#0f766e", "tertiaryColor": "#ecfeff"}}}%%
mindmap
  root((NEF2))
    Core
      Tensor
      Autograd
      Modules
      Optimizers
    Models
      GPT
      200M preset
      Byte tokenizer
    Data
      Character tokenizer
      Byte tokenizer
      Language model batches
    GPU
      CUDA driver API
      Multi device selection
      Vector kernels
      CudaSGD
    Artifacts
      PyPI package
      GitHub source
      model.nef

CUDA Backend

NEF2 includes a direct NVIDIA CUDA backend. It loads the CUDA driver (nvcuda.dll on Windows, libcuda.so.1 on Linux), creates a context, loads NEF2 PTX kernels, allocates device memory, launches kernels, and copies results back.

from nef2 import gpu

print(gpu.device_name())
print(gpu.list_devices())

a = gpu.tensor([1, 2, 3])
b = gpu.tensor([4, 5, 6])

print((a + b).tolist())

Choose a CUDA device:

from nef2 import gpu

with gpu.use_device(0):
    x = gpu.tensor([1, 2, 3])

GPU matrix multiplication:

from nef2 import gpu

a = gpu.tensor([[1.0, 2.0], [3.0, 4.0]])
b = gpu.tensor([[5.0, 6.0], [7.0, 8.0]])
c = a.matmul(b)
print(c.tolist())

Keep the GPU busy long enough to verify in nvidia-smi:

nef2-gpu-stress --seconds 60 --hold-seconds 10 --elements 50000000

Expected result:

device=NVIDIA GeForce RTX 3050 Ti Laptop GPU
result=[3.0, 3.0, 3.0]

Training

NEF2 does not bundle datasets. Bring your own text, tokenize it, and train:

from nef2 import GPT, GPTConfig, Tensor, AdamW, cross_entropy, save_model
from nef2.data import make_lm_batch
from nef2.byte_tokenizer import ByteTokenizer

# Load any text you want
with open("my_text.txt", "rb") as f:
    text = f.read()

tokenizer = ByteTokenizer()
tokens = tokenizer.encode(text)

config = GPTConfig(
    vocab_size=tokenizer.vocab_size,
    block_size=256,
    n_embd=384,
    n_layer=6,
    n_head=6,
)
model = GPT(config)
opt = AdamW(model.parameters(), lr=3e-4)

for step in range(1000):
    xb, yb = make_lm_batch(tokens, batch_size=32, block_size=config.block_size)
    loss = cross_entropy(model(xb), yb)
    opt.zero_grad()
    loss.backward()
    opt.step()
    if step % 100 == 0:
        print(f"step {step}: loss = {loss.item():.4f}")

save_model(model, "model.nef")

Design Principles

Keep the CPU core dependency-free and readable.
Make tensor, module, optimizer, and model APIs familiar to users of modern ML frameworks without importing those frameworks.
Own the GPU path inside NEF2 instead of delegating training to PyTorch or CuPy.
Be explicit about scope: implemented features should run; planned features should be labeled as planned.

Package Layout

nef2/
  tensor.py                 # Tensor storage and reverse-mode autograd
  nn.py                     # Module, Parameter, layers, cross entropy
  optim.py                  # SGD, AdamW, CudaSGD
  gpu.py                    # CUDA driver backend
  serialization.py          # .nef save/load helpers
  tokenizer.py              # Character tokenizer
  byte_tokenizer.py         # Byte tokenizer
  data.py                   # Language-model batching
  models/gpt.py             # Causal Transformer model
  models/presets.py         # 200M preset and parameter estimator
  cli/                      # GPU stress test command-line tool

Roadmap

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#7c3aed", "primaryTextColor": "#ffffff", "lineColor": "#64748b", "secondaryColor": "#f5f3ff"}}}%%
flowchart TB
    A["CPU GPT + CUDA vector kernels"] --> B["GPU matmul + layernorm + cross-entropy"]
    B --> C["GPU backward kernels"]
    C --> D["Full GPT CUDA training"]
    D --> E["KV cache compressor (INT4, delta, LZ)"]
    E --> F["Context window extension (RoPE / ALiBi)"]
    F --> G["Checkpointed 200M training"]
    G --> H["AMD HIP / Intel Level Zero / Apple Metal backends"]

Status

NEF2 is an alpha framework. It is suitable for experimentation, education, framework development, and small-model tests. It is not yet a fast production training stack for large LLMs.

NVIDIA CUDA is implemented for the current low-level backend. AMD, Intel, Apple, Vulkan, OpenCL, HIP/ROCm, Metal, and Level Zero require separate backend implementations. NEF2 reports unsupported backends clearly instead of pretending unsupported GPUs are active.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

May 13, 2026

This version

0.2.2

May 11, 2026

0.2.1

May 11, 2026

0.2.0

May 11, 2026

0.1.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nef2-0.2.2.tar.gz (25.3 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nef2-0.2.2-py3-none-any.whl (26.0 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file nef2-0.2.2.tar.gz.

File metadata

Download URL: nef2-0.2.2.tar.gz
Upload date: May 11, 2026
Size: 25.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nef2-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`8552fa591d6eda48f80bfed37582b7958e7054e9174575ed8f75f1bfe39024ac`
MD5	`06ba842113fd9a3063c4bc637b03789f`
BLAKE2b-256	`ac33d228deb8b5d70a7fefe23a196001ee6b60aaed4a430e23f8487cba4a8936`

See more details on using hashes here.

File details

Details for the file nef2-0.2.2-py3-none-any.whl.

File metadata

Download URL: nef2-0.2.2-py3-none-any.whl
Upload date: May 11, 2026
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nef2-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`00bc9ddb343855e8a11ec81248676f8e2da68caaf2efd34d08d12c315561c9cd`
MD5	`86402dd3926ed98133dfabfd259fc47e`
BLAKE2b-256	`b24b481ab0085de57091065d940d62a8a9752f0b5534c25748d787fd624b3b48`

See more details on using hashes here.

nef2 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NEF2

Overview

Install

Quick Start

Feature Matrix

Architecture

Project Mindmap

CUDA Backend

Training

Design Principles

Package Layout

Roadmap

Status

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes