Training of small LM models for SimpleStories
Project description
simple_stories_train
Project for training small LMs. Designed for training on SimpleStories, an extension of TinyStories.
- Training script is based on the efficeint train_gpt2.py in llm.c (licensed under MIT ((c) 2024 Andrei Karpathy))
- Some model architecture implementations are based on TransformerLens (licensed under MIT ((c) 2022 TransformerLensOrg)).
Installation
From the root of the repository, run one of
make install-dev # To install the package, dev requirements and pre-commit hooks
make install # To just install the package (runs `pip install -e .`)
Development
Suggested extensions and settings for VSCode are provided in .vscode/. To use the suggested
settings, copy .vscode/settings-example.json to .vscode/settings.json.
There are various make commands that may be helpful
make check # Run pre-commit on all files (i.e. pyright, ruff linter, and ruff formatter)
make type # Run pyright on all files
make format # Run ruff linter and formatter on all files
make test # Run tests that aren't marked `slow`
make test-all # Run all tests
Usage
Training a model
python train_llama.py [PATH/TO/CONFIG.yaml] [--key1 value1 --key2 value2 ...]
where
PATH/TO/CONFIG.yamlcontains the training config. If no path is provided, a default config will be used.--key1 value1 --key2 value2 ...override values in the config. Note that if you wish to update a nested value, you must use dotted notation (e.g.--train_dataset_config.name my_dataset).
If running on CPU, you may need to set --compile=False.
To run on multiple GPUs, use
torchrun --standalone --nproc_per_node=N train_llama.py ...
where N is the number of GPUs to use.
Logging with Weights & Biases
To track training with Weights & Biases, you can set the WANDB_PROJECT and WANDB_API_KEY variables in
.env. API keys can be obtained from your Weights & Biases account settings.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_stories_train-0.0.1.tar.gz.
File metadata
- Download URL: simple_stories_train-0.0.1.tar.gz
- Upload date:
- Size: 39.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dfc23f9ba0e2c8049cb58cc4d79f64ea1ae661bf3167a048daf91f2e1867c4f
|
|
| MD5 |
c42c1925c50470c9418b2e2bff24b95f
|
|
| BLAKE2b-256 |
9a765bf775baa1df73095dee807d1313f4033a5670af4ea3e6c304c9fb8851ee
|
File details
Details for the file simple_stories_train-0.0.1-py3-none-any.whl.
File metadata
- Download URL: simple_stories_train-0.0.1-py3-none-any.whl
- Upload date:
- Size: 42.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
567d1e1e58f49f77d5ca2fda4f908e5d0669fabd7816ff9f3861a678e18be202
|
|
| MD5 |
595c72a0bc334c8be1bbd630db1c5d82
|
|
| BLAKE2b-256 |
5d7aa1181b7da53a91c479dd2b0b1b04150060bdf721c0e8857db8021482cb31
|