Skip to main content

A lab where gradients flow and models go to prod.

Project description

gradientlab

A lab where gradients flow and models go to prod.

This repo is an attempt to have a tidy place for my own small scale pytorch-based deep learning experiments.

Guiding principles

  • Experiment as a first-class citizen
    • full replicability: dataprep, modeling, configs, training and eval code is self-contained
  • Architecture copy-paste is allowed, no preemptive optimization when doing applied AI
    • Still, we're not savages: If you're reusing an exact same nn.Module N times, go modularize it.
    • For me N=3 means that the thing works => refactor.
  • Cristalize a stable architecture or nn.Module under neuralblocks/
    • Avoid model overparametrization and huge configs
  • HuggingFace basic compatibility
    • we don't do whitepapers, we push to prod ASAP
  • Notebooks as a clean demo interface
    • do dirty & temporary stuff under notebooks/trash
  • ...

If you want to fork the repo or install it, keep reading.

Install

prereqs

  • A linux box with CUDA or apple silicon (no flash linear attention support for this last one).
    • Rocm may work as well, not tested
  • uv:
curl -LsSf https://astral.sh/uv/install.sh | sh

As your own personal lab -> Fork this repo and clone it

git clone https://github.com/<your-github-user>/gradientlab.git
cd gradientlab/
uv sync

As a library

uv add gradientlab

Experiments

An example is under /experiments, a custom 22-layers, yet only 20M param GPT, which you can find under /modeling:

  • PolyReLU ffn activation (works better than SwiGLU)
  • parallel attention (from PaLM paper & Moondream)
  • squeeze-and-excite narrow transformer backbone (an idea of mine for small lang models, prefering depth over width, inspired by computer vision)
  • sigmoid gating post sdpa (paper by Qwen team)
  • attn values heads expansion
  • absolute position embeddings (I know)
  • KV-cache support
  • embed_dim != hidden_dim
  • Trained on 3B italian tokens from fineweb2 in ~8 hours on a RTXA4000.
    • byte_level_tokenizer, couldn't use qwen3 tokenizer due to memory constraints (gpu poor) and weird torch.compile errors
  • Slim notebook to demo model loading and generation.
  • single-GPU trainer with trackio to track metrics

Each experiment entrypoint is located in __main__.py So you can run an experiment like this:

uv run -m gradientlab.experiments.exp20251016_0_lm_20m_polyrelu_lm_vanilla_fineweb_ita

The modeling/ folder under an experiment will contain all the modules your model is made of. Some notes:

  • factory.py -> model factory, is where you will construct the models with specific parameters
  • model_cfg.py -> model config class
  • model.py -> your high-level model class, extending some hf class or mixins

Feel free to adapt the repo as you wish and share your learnings in the discussion section.

Publish

If you want to publish your own gradientlab-* project as library, just create a PyPI token and follow the official uv guide.

Generally as simple as:

uv build
UV_PUBLISH_TOKEN=pypi-your-token uv publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradientlab-0.1.3.tar.gz (82.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gradientlab-0.1.3-py3-none-any.whl (133.7 kB view details)

Uploaded Python 3

File details

Details for the file gradientlab-0.1.3.tar.gz.

File metadata

  • Download URL: gradientlab-0.1.3.tar.gz
  • Upload date:
  • Size: 82.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for gradientlab-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1052d14361c0d92e58fb53b1caaa40454c63fad509edbc1faf4760cde781a8d9
MD5 16261be303c53432ccaad1f482bc93dd
BLAKE2b-256 0c2487c9ae3912ca60192d518eade62e6cd6b66c9da7d587258ba2d00d0e2706

See more details on using hashes here.

File details

Details for the file gradientlab-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for gradientlab-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2d8255199ce4a36229e8ad33ce40c31ec3034f8a72f5d154ebc37f8b4341edef
MD5 cb8c495d459d6d37d90cac945860ed4a
BLAKE2b-256 d2a79becd6cf0c0ca9e4f39259e91c6f86abd39746d33fa38c641ec244029391

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page