Skip to main content

Paper - Pytorch

Project description

Multi-Modality

TeraGPT

Zeta present TeraGPT – the simplest implementation for training large language models with tens or hundreds of billions of parameters. This work was inspired by Andrej Karpathy's nanoGPT. However, while nanoGPT is designed to train medium sized models up to around the 1B parameter range, TeraGPT leverages the over-powered Zeta framework to use a single simple model definition and training loop to scale to GPT-3 sized models run across zetascale clusters.

As in nanoGPT, the main training logic is split between train.py and model.py, with a total of 350 lines of simple, readable pytorch code combined. While nanoGPT can replicate GPT-2, gigaGPT is built to be able to replicate something of the scale of GPT-4 (albeit possibly with a dataset upgrade compared to the nanoGPT support). We have tested that models up to 175b parameters in size run functionally correctly at high throughput and have no reason to suspect that you can't scale significantly larger.

The combination of the scale of the hardware, the weight streaming execution mode, and the data parallel scale-out across machines is what provides the magic required for easy scale-out to larger models and larger clusters.

Install

pip3 install teragpt

Usage

import torch
from teragpt.main import TeraGPT

model = TeraGPT(
    dim=4096,
    depth=6,
    heads=8,
    num_tokens=20000,
)

x = torch.randint(0, 20000, (1, 4096))

out = model(x)
print(out.shape)

Tokenizer

from teragpt import Tokenizer

tokenizer_name = "hf-internal-testing/llama-tokenizer"
tokenizer = Tokenizer(tokenizer_name=tokenizer_name)
encoded_text = tokenizer.encode("This is a sample text")
decoded_text = tokenizer.decode(encoded_text)
print("Encoded text:", encoded_text)
print("Decoded text:", decoded_text)

Train

trainer.py sets up the environment for distributed training and then initializes a Trainer object to start the training process.

Environment Variables

The script uses the following environment variables:

  • MASTER_ADDR: The address of the master node. This is typically 'localhost'.
  • MASTER_PORT: The port that the master node is listening on. This is typically '9994'.
  • RANK: The rank of the current node in the distributed training setup. This is typically '0' for the master node.
  • WORLD_SIZE: The total number of nodes participating in the distributed training. This is typically the number of GPUs available.

How to Train the Model

  1. Set the environment variables MASTER_ADDR, MASTER_PORT, RANK, and WORLD_SIZE appropriately for your distributed training setup.

  2. Run the script with any additional arguments required by the Trainer object.

python train.py

Please note that the exact arguments required by the Trainer object will depend on your specific training setup and the model you are training.

Note

The comment [CRITICAL] Pay attention to this when scaling to multiple GPUs and clusters indicates that the settings for RANK and WORLD_SIZE are particularly important when scaling the training process to multiple GPUs and clusters. Make sure to set these variables correctly to ensure efficient distributed training.


Codebase comparison

The standard way to train a GPT-3 sized model is to use frameworks such as Nvidia Megatron. Megatron however is a large and complex framework that’s challenging to implement. This is what motivated the creation of nanoGPT – a light, readable, hackable framework. To quantify the complexity of these frameworks, we counted the lines of code in reach repo. Megatron has 20,507, lines of code while nanoGPT and Teragpt have 639 and 350 lines of code respectively. This supports our primary claim that TeraGPT trains GPT-3 sized models while retaining the simplicity of nanoGPT.

Megatron-LM

Language files blank comment code
Python 99 4710 4407 18395
C/C++ Header 4 146 90 1118
C++ 4 137 117 649
CUDA 3 41 20 220
HTML 1 15 2 107
Bourne Shell 1 1 0 9
make 1 2 0 7
SUM: 115 5052 4636 20507

nanoGPT

Language files blank comment code
Python 5 90 187 639
SUM: 5 90 187 639

TeraGPT

Language files blank comment code
Python 3 109 1 350
SUM: 6 109 1 350

License

Apache

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teragpt-0.0.3.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

teragpt-0.0.3-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file teragpt-0.0.3.tar.gz.

File metadata

  • Download URL: teragpt-0.0.3.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for teragpt-0.0.3.tar.gz
Algorithm Hash digest
SHA256 9897198ce681a8546f1986192853db6bb4b63df7115ee0bfaed4b496703ecb7d
MD5 755b89d0f0d969e756860e598e1495b1
BLAKE2b-256 12557e479a90a14b98ce08b9a140d1a09114aa94f02e6c99266dd878e929fc85

See more details on using hashes here.

File details

Details for the file teragpt-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: teragpt-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for teragpt-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 632994f7a36631de5a93d927b442ac9ebf8b5fd7b787bc557272b9b004bcabc6
MD5 53314accd1a4c140e8799a757e1aa782
BLAKE2b-256 31780c6a8476e170160931df665963d15ba3b77f8f3c7da561617bf69e3c8838

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page