Skip to main content

Training of small LM models for SimpleStories

Project description

simple_stories_train

Project for training small LMs. Designed for training on SimpleStories, an extension of TinyStories.

  • Training script is based on the efficeint train_gpt2.py in llm.c (licensed under MIT ((c) 2024 Andrei Karpathy))
  • Some model architecture implementations are based on TransformerLens (licensed under MIT ((c) 2022 TransformerLensOrg)).

Installation

From the root of the repository, run one of

make install-dev  # To install the package, dev requirements and pre-commit hooks
make install  # To just install the package (runs `pip install -e .`)

Development

Suggested extensions and settings for VSCode are provided in .vscode/. To use the suggested settings, copy .vscode/settings-example.json to .vscode/settings.json.

There are various make commands that may be helpful

make check  # Run pre-commit on all files (i.e. pyright, ruff linter, and ruff formatter)
make type  # Run pyright on all files
make format  # Run ruff linter and formatter on all files
make test  # Run tests that aren't marked `slow`
make test-all  # Run all tests

Usage

Training a model

python train_llama.py [PATH/TO/CONFIG.yaml] [--key1 value1 --key2 value2 ...]

where

  • PATH/TO/CONFIG.yaml contains the training config. If no path is provided, a default config will be used.
  • --key1 value1 --key2 value2 ... override values in the config. Note that if you wish to update a nested value, you must use dotted notation (e.g. --train_dataset_config.name my_dataset).

If running on CPU, you may need to set --compile=False.

To run on multiple GPUs, use

torchrun --standalone --nproc_per_node=N train_llama.py ...

where N is the number of GPUs to use.

Logging with Weights & Biases

To track training with Weights & Biases, you can set the WANDB_PROJECT and WANDB_API_KEY variables in .env. API keys can be obtained from your Weights & Biases account settings.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_stories_train-0.0.1.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_stories_train-0.0.1-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file simple_stories_train-0.0.1.tar.gz.

File metadata

  • Download URL: simple_stories_train-0.0.1.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.10

File hashes

Hashes for simple_stories_train-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4dfc23f9ba0e2c8049cb58cc4d79f64ea1ae661bf3167a048daf91f2e1867c4f
MD5 c42c1925c50470c9418b2e2bff24b95f
BLAKE2b-256 9a765bf775baa1df73095dee807d1313f4033a5670af4ea3e6c304c9fb8851ee

See more details on using hashes here.

File details

Details for the file simple_stories_train-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_stories_train-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 567d1e1e58f49f77d5ca2fda4f908e5d0669fabd7816ff9f3861a678e18be202
MD5 595c72a0bc334c8be1bbd630db1c5d82
BLAKE2b-256 5d7aa1181b7da53a91c479dd2b0b1b04150060bdf721c0e8857db8021482cb31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page