Skip to main content

Karpathy's pure-Python microGPT as a Strands Agents model provider — zero dependencies, pure autograd, from scratch.

Project description

strands-microgpt

PyPI version CI

Karpathy's pure-Python microGPT as a Strands Agents model provider — zero dependencies, pure autograd, trained from scratch.

Based on @karpathy's atomic GPT gist: "The most atomic way to train and run inference for a GPT in pure, dependency-free Python. This file is the complete algorithm. Everything else is just efficiency."


What is this?

A complete GPT implementation — autograd engine, transformer architecture, tokenizer, Adam optimizer, training loop, and inference — in pure Python with zero dependencies. No PyTorch, no NumPy, no CUDA. Just math, random, and the algorithm.

Packaged as a proper Strands model provider so you can:

  1. Use it as a Model — drop-in Strands Model interface for character-level generation
  2. Use it as a Tool — train and generate from any Strands agent via tool calls
  3. Learn from it — the entire algorithm is readable, hackable, and documented

Install

pip install strands-microgpt

Requirements: Python ≥3.10, strands-agents. That's it. No GPU needed.


Quick Start

As a standalone engine (zero deps)

from strands_microgpt import MicroGPT

# Load dataset, build tokenizer, create model
model, tokenizer, docs = MicroGPT.from_dataset()

# Train (1000 steps on names.txt)
model.train_on_docs(docs, tokenizer, num_steps=1000)

# Generate new names
for name in model.generate(tokenizer, num_samples=10):
    print(name)

As a Strands Model provider

from strands import Agent
from strands_microgpt import MicroGPTModel

model = MicroGPTModel(num_steps=1000, temperature=0.5)
agent = Agent(model=model)
agent("Generate some names")

As a Tool (in any agent)

from strands import Agent
from strands_microgpt import microgpt_train, microgpt_generate

# Use with Bedrock, OpenAI, or any model
agent = Agent(tools=[microgpt_train, microgpt_generate])
agent("Train a GPT on the names dataset for 500 steps, then generate 10 names")

Architecture

The complete algorithm in ~300 lines:

strands_microgpt/
├── engine.py           # Value (autograd), Tokenizer, MicroGPT (transformer)
├── microgpt_model.py   # Strands Model interface
└── tools/
    ├── microgpt_train.py     # Training tool
    └── microgpt_generate.py  # Generation tool

The Autograd Engine

from strands_microgpt import Value

a = Value(2.0)
b = Value(3.0)
c = a * b + a  # builds computation graph
c.backward()   # backpropagate gradients
print(a.grad)  # 4.0 (dc/da = b + 1)

Supports: +, *, -, /, **, relu(), exp(), log(), backward()

The Transformer

GPT-2 architecture with:

  • RMSNorm (instead of LayerNorm)
  • No biases
  • ReLU (instead of GeLU)
  • Multi-head causal attention
  • Adam optimizer with linear LR decay

Parameters

Config Default Description
n_layer 1 Transformer depth
n_embd 16 Embedding dimension
block_size 16 Context window
n_head 4 Attention heads
num_steps 1000 Training steps
learning_rate 0.01 Initial LR (linear decay)
temperature 0.5 Generation temperature

Custom Datasets

Train on anything — names, poems, molecules, DNA, code:

from strands_microgpt import MicroGPT, Tokenizer

docs = ["the cat sat on the mat", "the dog sat on the log"] * 100
tokenizer = Tokenizer.from_docs(docs)
model = MicroGPT(vocab_size=tokenizer.vocab_size, n_embd=32, block_size=32)

model.train_on_docs(docs, tokenizer, num_steps=2000)
samples = model.generate(tokenizer, num_samples=10, temperature=0.7)

Checkpoints

# Save
model.save_checkpoint("model.json", tokenizer)

# Load
model, tokenizer, metadata = MicroGPT.load_checkpoint("model.json")
samples = model.generate(tokenizer, num_samples=10)

Examples

Example Description
01_basic_training.py Train on names, generate new ones
02_strands_agent.py Use as a Strands Model provider
03_tool_usage.py Train/generate via tool calls
04_custom_dataset.py Train on custom text data
05_autograd_exploration.py Explore the autograd engine

Why?

"Everything else is just efficiency." — @karpathy

This package exists to show that:

  1. A GPT is just math. No magic, no black boxes. The entire algorithm fits in your head.
  2. Strands Model interface is universal. If it can generate tokens, it can be a Strands model.
  3. Understanding > Using. Train a transformer from scratch to truly grok what LLMs do.

The model is tiny and slow (pure Python, no vectorization). For production, use Bedrock, OpenAI, or any real provider. For learning, this is the best code to read.


API Reference

MicroGPT

MicroGPT(vocab_size, n_layer=1, n_embd=16, block_size=16, n_head=4, seed=42)
  • .train_on_docs(docs, tokenizer, num_steps, learning_rate, log_every, callback)List[float]
  • .generate(tokenizer, num_samples, temperature, max_length)List[str]
  • .save_checkpoint(path, tokenizer, metadata) → None
  • .load_checkpoint(path)(MicroGPT, Tokenizer, Dict) (classmethod)
  • .from_dataset(dataset_url, dataset_path, **kwargs)(MicroGPT, Tokenizer, docs) (classmethod)

MicroGPTModel (Strands Model)

MicroGPTModel(dataset_url, num_steps=1000, temperature=0.5, ...)

Drop-in replacement for any Strands model. Trains on first use, then generates.

Value (Autograd)

Value(data)  # scalar autograd node

Supports: +, *, -, /, **, .relu(), .exp(), .log(), .backward()

Tools

  • microgpt_train(dataset_url, num_steps, n_layer, ...) — Train a model
  • microgpt_generate(checkpoint_path, num_samples, temperature) — Generate from checkpoint

Resources


License

MIT | Based on @karpathy's work | Built with Strands Agents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_microgpt-0.1.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_microgpt-0.1.0-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file strands_microgpt-0.1.0.tar.gz.

File metadata

  • Download URL: strands_microgpt-0.1.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for strands_microgpt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 73f9f5879ad925f313bb3cce2bdeb278ba225419d6399a793990db23aeaf83b6
MD5 1dce1fa7504a53e091a8b03b086425ae
BLAKE2b-256 892c083ed4630a7f44efefd8bb09d2b5026ed21de4010c70c58636f9a20f4197

See more details on using hashes here.

File details

Details for the file strands_microgpt-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for strands_microgpt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 65abb9ab2a09f8aaaa5c619e076e92ffd86eef14fd014efce5cf633c978be699
MD5 82743e6ccb61a73ca0d8abdcc5e76b20
BLAKE2b-256 f5810457337dabf78003dee00261471c16e14a732a4a7cf41e01d6ec0a46e516

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page