Karpathy's pure-Python microGPT as a Strands Agents model provider — zero dependencies, pure autograd, from scratch.
Project description
strands-microgpt
Karpathy's pure-Python microGPT as a Strands Agents model provider — zero dependencies, pure autograd, trained from scratch.
Based on @karpathy's atomic GPT gist: "The most atomic way to train and run inference for a GPT in pure, dependency-free Python. This file is the complete algorithm. Everything else is just efficiency."
What is this?
A complete GPT implementation — autograd engine, transformer architecture, tokenizer, Adam optimizer, training loop, and inference — in pure Python with zero dependencies. No PyTorch, no NumPy, no CUDA. Just math, random, and the algorithm.
Packaged as a proper Strands model provider so you can:
- Use it as a Model — drop-in Strands
Modelinterface for character-level generation - Use it as a Tool — train and generate from any Strands agent via tool calls
- Learn from it — the entire algorithm is readable, hackable, and documented
Install
pip install strands-microgpt
Requirements: Python ≥3.10,
strands-agents. That's it. No GPU needed.
Quick Start
As a standalone engine (zero deps)
from strands_microgpt import MicroGPT
# Load dataset, build tokenizer, create model
model, tokenizer, docs = MicroGPT.from_dataset()
# Train (1000 steps on names.txt)
model.train_on_docs(docs, tokenizer, num_steps=1000)
# Generate new names
for name in model.generate(tokenizer, num_samples=10):
print(name)
As a Strands Model provider
from strands import Agent
from strands_microgpt import MicroGPTModel
model = MicroGPTModel(num_steps=1000, temperature=0.5)
agent = Agent(model=model)
agent("Generate some names")
As a Tool (in any agent)
from strands import Agent
from strands_microgpt import microgpt_train, microgpt_generate
# Use with Bedrock, OpenAI, or any model
agent = Agent(tools=[microgpt_train, microgpt_generate])
agent("Train a GPT on the names dataset for 500 steps, then generate 10 names")
Architecture
The complete algorithm in ~300 lines:
strands_microgpt/
├── engine.py # Value (autograd), Tokenizer, MicroGPT (transformer)
├── microgpt_model.py # Strands Model interface
└── tools/
├── microgpt_train.py # Training tool
└── microgpt_generate.py # Generation tool
The Autograd Engine
from strands_microgpt import Value
a = Value(2.0)
b = Value(3.0)
c = a * b + a # builds computation graph
c.backward() # backpropagate gradients
print(a.grad) # 4.0 (dc/da = b + 1)
Supports: +, *, -, /, **, relu(), exp(), log(), backward()
The Transformer
GPT-2 architecture with:
- RMSNorm (instead of LayerNorm)
- No biases
- ReLU (instead of GeLU)
- Multi-head causal attention
- Adam optimizer with linear LR decay
Parameters
| Config | Default | Description |
|---|---|---|
n_layer |
1 | Transformer depth |
n_embd |
16 | Embedding dimension |
block_size |
16 | Context window |
n_head |
4 | Attention heads |
num_steps |
1000 | Training steps |
learning_rate |
0.01 | Initial LR (linear decay) |
temperature |
0.5 | Generation temperature |
Custom Datasets
Train on anything — names, poems, molecules, DNA, code:
from strands_microgpt import MicroGPT, Tokenizer
docs = ["the cat sat on the mat", "the dog sat on the log"] * 100
tokenizer = Tokenizer.from_docs(docs)
model = MicroGPT(vocab_size=tokenizer.vocab_size, n_embd=32, block_size=32)
model.train_on_docs(docs, tokenizer, num_steps=2000)
samples = model.generate(tokenizer, num_samples=10, temperature=0.7)
Checkpoints
# Save
model.save_checkpoint("model.json", tokenizer)
# Load
model, tokenizer, metadata = MicroGPT.load_checkpoint("model.json")
samples = model.generate(tokenizer, num_samples=10)
Examples
| Example | Description |
|---|---|
| 01_basic_training.py | Train on names, generate new ones |
| 02_strands_agent.py | Use as a Strands Model provider |
| 03_tool_usage.py | Train/generate via tool calls |
| 04_custom_dataset.py | Train on custom text data |
| 05_autograd_exploration.py | Explore the autograd engine |
Why?
"Everything else is just efficiency." — @karpathy
This package exists to show that:
- A GPT is just math. No magic, no black boxes. The entire algorithm fits in your head.
- Strands Model interface is universal. If it can generate tokens, it can be a Strands model.
- Understanding > Using. Train a transformer from scratch to truly grok what LLMs do.
The model is tiny and slow (pure Python, no vectorization). For production, use Bedrock, OpenAI, or any real provider. For learning, this is the best code to read.
API Reference
MicroGPT
MicroGPT(vocab_size, n_layer=1, n_embd=16, block_size=16, n_head=4, seed=42)
.train_on_docs(docs, tokenizer, num_steps, learning_rate, log_every, callback)→List[float].generate(tokenizer, num_samples, temperature, max_length)→List[str].save_checkpoint(path, tokenizer, metadata)→ None.load_checkpoint(path)→(MicroGPT, Tokenizer, Dict)(classmethod).from_dataset(dataset_url, dataset_path, **kwargs)→(MicroGPT, Tokenizer, docs)(classmethod)
MicroGPTModel (Strands Model)
MicroGPTModel(dataset_url, num_steps=1000, temperature=0.5, ...)
Drop-in replacement for any Strands model. Trains on first use, then generates.
Value (Autograd)
Value(data) # scalar autograd node
Supports: +, *, -, /, **, .relu(), .exp(), .log(), .backward()
Tools
microgpt_train(dataset_url, num_steps, n_layer, ...)— Train a modelmicrogpt_generate(checkpoint_path, num_samples, temperature)— Generate from checkpoint
Resources
- Karpathy's GPT gist — The original
- micrograd — Karpathy's autograd engine
- makemore — Character-level language modeling
- Strands Agents — The agent framework
- strands-cosmos — NVIDIA Cosmos VLM provider
License
MIT | Based on @karpathy's work | Built with Strands Agents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_microgpt-0.1.0.tar.gz.
File metadata
- Download URL: strands_microgpt-0.1.0.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73f9f5879ad925f313bb3cce2bdeb278ba225419d6399a793990db23aeaf83b6
|
|
| MD5 |
1dce1fa7504a53e091a8b03b086425ae
|
|
| BLAKE2b-256 |
892c083ed4630a7f44efefd8bb09d2b5026ed21de4010c70c58636f9a20f4197
|
File details
Details for the file strands_microgpt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: strands_microgpt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65abb9ab2a09f8aaaa5c619e076e92ffd86eef14fd014efce5cf633c978be699
|
|
| MD5 |
82743e6ccb61a73ca0d8abdcc5e76b20
|
|
| BLAKE2b-256 |
f5810457337dabf78003dee00261471c16e14a732a4a7cf41e01d6ec0a46e516
|