Skip to main content

Minimalistic Large Language Model Training and Finetuning

Project description

nanotron

The objective of this library is to provide easy distributed primitives in order to train a variety of models efficiently using 3D parallelism. For more information about the internal design of the library or 3D parallelism in general, please check out [docs.md] and [3d_parallelism.md].

Philosophy

  • Make it fast. At least as fast as other open source versions.
  • Make it minimal. We don't actually need to support all techniques and all versions of 3D parallelism. What matters is that we can efficiently use the "best" ones.
  • Make everything explicit instead of transparent. As we move forward, making things transparent works well when it works well but is a horrible debugging experience if one doesn't understand the implications of techniques used. In order to mitigate this, we choose to be explicit in the way it does things

Core Features

We support the following:

  • 3D parallelism, including one-forward-one-backward pipeline engine
  • ZeRO-1 optimizer
  • FP32 gradient accumulation
  • Parameter tying/sharding

Installation

Requirements:

  • Python >= 3.10
  • PyTorch >= 2.0.0
  • Flash-Attention >= 2.5.0

To install (in a new env):

pip install torch
pip install packaging; pip install "flash-attn>=2.5.0"  --no-build-isolation
git clone git@github.com:huggingface/nanotron.git
cd nanotron
pip install -e .

Also nice to have transformers datasets python-etcd tensorboardX: pip install transformers datasets python-etcd tensorboardX

We also support a set of flavors that you can install using pip install -e [$FLAVOR]:

  • dev: Used is you are developping in nanotron. It installs in particular our linter mechanism. On top of that you have to run pre-commit install afterwards.
  • test: We use pytest in order to run out testing suite. In order to run tests in parallel, it will install pytest-xdist, which you can leverage by running pytest -n 12 tests (12 is the number of parallel test)

Quick examples

In the /examples directory, you can find a few example configuration file, and a script to run it.

You can run a sample training using:

torchrun --nproc_per_node=8 run_train.py --config-file examples/debug_run_train.yaml

And run a sample generation using:

torchrun --nproc_per_node=8 run_generation.py --ckpt-path checkpoints/text/4

Development guidelines

If you plan on developping on nanotron, we suggest you install the dev flavor: pip install -e ".[dev]"

We use pre-commit to run a bunch of callbacks on each commit, mostly normalization code in order for the codebase to stay consistent. Please do run pre-commit install.

For the linting:

pre-commit install
pre-commit run --config .pre-commit-config.yaml --all-files

Features we would like to add:

  • Support torch.compile
  • Support torch.distributed.rpc
  • More optimized kernels
  • Support Zero3
  • Other PP schedules (such as Interleaved 1f1b...)
  • Ring attention / Sequence Parallelism
  • 3D Parallel MoEs
  • Supporting more architectures (Mamba..)
  • ...

Credits

We would like to thank everyone working on LLMs, especially those sharing their work openly from which we took great inspiration: Nvidia for Megatron-LM/apex, Microsoft for DeepSpeed, HazyResearch for flash-attn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanotron-0.1.0.tar.gz (154.6 kB view details)

Uploaded Source

Built Distribution

nanotron-0.1.0-py3-none-any.whl (155.0 kB view details)

Uploaded Python 3

File details

Details for the file nanotron-0.1.0.tar.gz.

File metadata

  • Download URL: nanotron-0.1.0.tar.gz
  • Upload date:
  • Size: 154.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for nanotron-0.1.0.tar.gz
Algorithm Hash digest
SHA256 843ec35f3efb8ced9494ceaaaf37bb0fb85b2f87157199e6875b03c634fd949f
MD5 1b67f7709b8c379ab3842042e075ac57
BLAKE2b-256 941c185c251ce8c00d25f8dcf2fc9ff70153a6279524b8d87e58f5ce8b3a9e4f

See more details on using hashes here.

File details

Details for the file nanotron-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nanotron-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 155.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for nanotron-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8751df66927cdf75ff51dd17bc427e0c571e36743f0377728b18080016cd05b
MD5 6ca9e26122fe2a0ab82264c459fddad7
BLAKE2b-256 b4c6dfc412e8d4fd4acf19bcfac20753f2512562ee25873661bd338543595357

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page