Minimalistic Large Language Model Training and Finetuning
Project description
⚡️ Nanotron
Philosophy • Core Features • Installation • Usage • Contributions
The objective of this library is to provide easy distributed primitives in order to train a variety of models efficiently using 3D parallelism. For more information about the internal design of the library or 3D parallelism in general, please check out [docs.md] and [3d_parallelism.md].
Philosophy
- Make it fast. At least as fast as other open source versions.
- Make it minimal. We don't actually need to support all techniques and all versions of 3D parallelism. What matters is that we can efficiently use the "best" ones.
- Make everything explicit instead of transparent. As we move forward, making things transparent works well when it works well but is a horrible debugging experience if one doesn't understand the implications of techniques used. In order to mitigate this, we choose to be explicit in the way it does things
Core Features
We support the following:
- 3D parallelism, including one-forward-one-backward pipeline engine
- ZeRO-1 optimizer
- FP32 gradient accumulation
- Parameter tying/sharding
Installation
Requirements:
- Python >= 3.10
- PyTorch >= 2.0.0
- Flash-Attention >= 2.5.0
To install (in a new env):
pip install torch
pip install packaging; pip install "flash-attn>=2.5.0" --no-build-isolation
pip install nanotron
Also nice to have: pip install transformers datasets python-etcd tensorboardX
We also support a set of flavors that you can install using pip install -e [$FLAVOR]
:
dev
: Used is you are developping innanotron
. It installs in particular our linter mechanism. On top of that you have to runpre-commit install
afterwards.test
: We usepytest
in order to run out testing suite. In order to run tests in parallel, it will installpytest-xdist
, which you can leverage by runningpytest -n 12 tests
(12 is the number of parallel test)
Quick examples
In the /examples
directory, you can find a few example configuration file, and a script to run it.
You can run a sample training using:
torchrun --nproc_per_node=8 run_train.py --config-file examples/debug_run_train.yaml
And run a sample generation using:
torchrun --nproc_per_node=8 run_generation.py --ckpt-path checkpoints/text/4
Development guidelines
If you plan on developing on nanotron
, we suggest you install the dev
flavor: pip install -e ".[dev]"
We use pre-commit to run a bunch of callbacks on each commit, mostly normalization code in order for the codebase to stay consistent. Please do run pre-commit install
.
For the linting:
pre-commit install
pre-commit run --config .pre-commit-config.yaml --all-files
Features we would like to add:
- Support
torch.compile
- More optimized kernels
- Support Zero3
- Other PP schedules (such as Interleaved 1f1b...)
- Ring attention / Sequence Parallelism
- 3D Parallel MoEs
- Supporting more architectures (Mamba..)
- ...
Credits
We would like to thank everyone working on LLMs, especially those sharing their work openly from which we took great inspiration: Nvidia for Megatron-LM/apex
, Microsoft for DeepSpeed
, HazyResearch for flash-attn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nanotron-0.4.tar.gz
.
File metadata
- Download URL: nanotron-0.4.tar.gz
- Upload date:
- Size: 160.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ea378eb1b6b16c93a3021fcfd71dd73bd14d826126eae215e2b02e05cd6a120 |
|
MD5 | cf210507a30096bc53ad63655022eccd |
|
BLAKE2b-256 | 4c2507e627d9432d503f58af6e1eda61e1c7d2a1da9ef107d92f04919004c142 |
File details
Details for the file nanotron-0.4-py3-none-any.whl
.
File metadata
- Download URL: nanotron-0.4-py3-none-any.whl
- Upload date:
- Size: 163.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c1834e91c17c651f430d46a1e779bc49991c33bda26cc237beae7d12b383ad9 |
|
MD5 | dc58efb584292e80525e4c2f1bbdec60 |
|
BLAKE2b-256 | 4854895f2bb2121ff5dd8dddc68491ccb67840d34cd45c30b1f7a5887cdbc311 |