Skip to main content

A lab where gradients flow and models go to prod.

Project description

gradientlab

A lab where gradients flow and models go to prod.

Guiding principles

  • Experiment as a first-class citizen
    • full replicability: dataprep, modeling, configs, training and eval code is self-contained
  • Architecture copy-paste is allowed, no preemptive optimization when doing applied AI
    • Still, we're not savages: If you're reusing an exact same nn.Module N times, go modularize it.
    • For me N=3 means that the thing works => refactor.
  • Cristalize a stable architecture or nn.Module under neuralblocks/
    • Avoid model overparametrization and huge configs
  • HuggingFace basic compatibility
    • we don't do whitepapers, we push to prod ASAP
  • Notebooks as a clean demo interface
    • do dirty & temporary stuff under notebooks/trash
  • ...

Install

prereqs

  • A linux box with CUDA or apple silicon.
  • uv:
curl -LsSf https://astral.sh/uv/install.sh | sh

As your own personal lab -> Fork this repo and clone it

git clone https://github.com/<your-github-user>/gradientlab.git
cd gradientlab/
uv sync

As a library

uv add gradientlab

Experiments

An example is under /experiments, a custom 22-layers yet only 20M param GPT, which you can find under /modeling:

  • PolyReLU ffn activation (works better than SwiGLU)
  • parallel attention (from PaLM paper & Moondream)
  • squeeze-and-excite narrow transformer backbone (an idea of mine for small lang models, prefering depth over width, inspired by computer vision)
  • sigmoid gating post sdpa (paper by Qwen team)
  • attn values heads expansion
  • absolute position embeddings (I know)
  • KV-cache support
  • Trained on 3B italian tokens from fineweb2 in ~8 hours on a RTXA4000.
    • byte_level_tokenizer, couldn't use qwen3 tokenizer due to memory constraints (gpu poor) and weird torch.compile errors
  • Slim notebook to demo model loading and generation.

Publish

If you want to publish your own gradientlab-* project as library, just create a PyPI token and follow the official uv guide.

Generally as simple as:

uv build
UV_PUBLISH_TOKEN=pypi-your-token uv publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradientlab-0.1.1.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gradientlab-0.1.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file gradientlab-0.1.1.tar.gz.

File metadata

  • Download URL: gradientlab-0.1.1.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for gradientlab-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cca2b68c65a5a3cfffccd716886dafa193d712ce42b8f297a450343064fa039f
MD5 b6a5327f7405bcab4505e697f660adc4
BLAKE2b-256 055f6c14c1c73d25b4b4cbf8422a4811a894beb1af5dfc5daa1357ef79679263

See more details on using hashes here.

File details

Details for the file gradientlab-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for gradientlab-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0366ccb10b3e2eb97866e570a069d28cf53083d9510cadc487046ac25dcebe75
MD5 f425319e185646d3847a711a11ce1861
BLAKE2b-256 cbd8f4e20d80da8a43b3c5c16933304d80f205501985bbc2600976f0d7ac2d41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page