A lab where gradients flow and models go to prod.
Project description
gradientlab
A lab where gradients flow and models go to prod.
Guiding principles
- Experiment as a first-class citizen
- full replicability: dataprep, modeling, configs, training and eval code is self-contained
- Architecture copy-paste is allowed, no preemptive optimization when doing applied AI
- Still, we're not savages: If you're reusing an exact same nn.Module N times, go modularize it.
- For me N=3 means that the thing works => refactor.
- Cristalize a stable architecture or nn.Module under
neuralblocks/- Avoid model overparametrization and huge configs
- HuggingFace basic compatibility
- we don't do whitepapers, we push to prod ASAP
- Notebooks as a clean demo interface
- do dirty & temporary stuff under
notebooks/trash
- do dirty & temporary stuff under
- ...
Install
prereqs
- A linux box with CUDA or apple silicon.
- uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
As your own personal lab -> Fork this repo and clone it
git clone https://github.com/<your-github-user>/gradientlab.git
cd gradientlab/
uv sync
As a library
uv add gradientlab
Experiments
An example is under /experiments, a custom 22-layers yet only 20M param GPT, which you can find under /modeling:
- PolyReLU ffn activation (works better than SwiGLU)
- parallel attention (from PaLM paper & Moondream)
- squeeze-and-excite narrow transformer backbone (an idea of mine for small lang models, prefering depth over width, inspired by computer vision)
- sigmoid gating post sdpa (paper by Qwen team)
- attn values heads expansion
- absolute position embeddings (I know)
- KV-cache support
- Trained on 3B italian tokens from fineweb2 in ~8 hours on a RTXA4000.
- byte_level_tokenizer, couldn't use qwen3 tokenizer due to memory constraints (gpu poor) and weird torch.compile errors
- Slim notebook to demo model loading and generation.
Publish
If you want to publish your own gradientlab-* project as library, just create a PyPI token and follow the official uv guide.
Generally as simple as:
uv build
UV_PUBLISH_TOKEN=pypi-your-token uv publish
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gradientlab-0.1.1.tar.gz.
File metadata
- Download URL: gradientlab-0.1.1.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cca2b68c65a5a3cfffccd716886dafa193d712ce42b8f297a450343064fa039f
|
|
| MD5 |
b6a5327f7405bcab4505e697f660adc4
|
|
| BLAKE2b-256 |
055f6c14c1c73d25b4b4cbf8422a4811a894beb1af5dfc5daa1357ef79679263
|
File details
Details for the file gradientlab-0.1.1-py3-none-any.whl.
File metadata
- Download URL: gradientlab-0.1.1-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0366ccb10b3e2eb97866e570a069d28cf53083d9510cadc487046ac25dcebe75
|
|
| MD5 |
f425319e185646d3847a711a11ce1861
|
|
| BLAKE2b-256 |
cbd8f4e20d80da8a43b3c5c16933304d80f205501985bbc2600976f0d7ac2d41
|