Paper - Pytorch

These details have not been verified by PyPI

Project links

Project description

TeraGPT

Zeta present TeraGPT – the simplest implementation for training large language models with tens or hundreds of billions of parameters. This work was inspired by Andrej Karpathy's nanoGPT. However, while nanoGPT is designed to train medium sized models up to around the 1B parameter range, TeraGPT leverages the over-powered Zeta framework to use a single simple model definition and training loop to scale to GPT-3 sized models run across zetascale clusters.

As in nanoGPT, the main training logic is split between train.py and model.py, with a total of 350 lines of simple, readable pytorch code combined. While nanoGPT can replicate GPT-2, gigaGPT is built to be able to replicate something of the scale of GPT-4 (albeit possibly with a dataset upgrade compared to the nanoGPT support). We have tested that models up to 175b parameters in size run functionally correctly at high throughput and have no reason to suspect that you can't scale significantly larger.

The combination of the scale of the hardware, the weight streaming execution mode, and the data parallel scale-out across machines is what provides the magic required for easy scale-out to larger models and larger clusters.

Install

pip3 install teragpt

Usage

import torch
from teragpt.main import TeraGPT

model = TeraGPT(
    dim=4096,
    depth=6,
    heads=8,
    num_tokens=20000,
)

x = torch.randint(0, 20000, (1, 4096))

out = model(x)
print(out.shape)

Tokenizer

from teragpt import Tokenizer

tokenizer_name = "hf-internal-testing/llama-tokenizer"
tokenizer = Tokenizer(tokenizer_name=tokenizer_name)
encoded_text = tokenizer.encode("This is a sample text")
decoded_text = tokenizer.decode(encoded_text)
print("Encoded text:", encoded_text)
print("Decoded text:", decoded_text)

Train

trainer.py sets up the environment for distributed training and then initializes a Trainer object to start the training process.

Environment Variables

The script uses the following environment variables:

MASTER_ADDR: The address of the master node. This is typically 'localhost'.
MASTER_PORT: The port that the master node is listening on. This is typically '9994'.
RANK: The rank of the current node in the distributed training setup. This is typically '0' for the master node.
WORLD_SIZE: The total number of nodes participating in the distributed training. This is typically the number of GPUs available.

How to Train the Model

Set the environment variables MASTER_ADDR, MASTER_PORT, RANK, and WORLD_SIZE appropriately for your distributed training setup.
Run the script with any additional arguments required by the Trainer object.

python train.py

Please note that the exact arguments required by the Trainer object will depend on your specific training setup and the model you are training.

Note

The comment [CRITICAL] Pay attention to this when scaling to multiple GPUs and clusters indicates that the settings for RANK and WORLD_SIZE are particularly important when scaling the training process to multiple GPUs and clusters. Make sure to set these variables correctly to ensure efficient distributed training.

Codebase comparison

The standard way to train a GPT-3 sized model is to use frameworks such as Nvidia Megatron. Megatron however is a large and complex framework that’s challenging to implement. This is what motivated the creation of nanoGPT – a light, readable, hackable framework. To quantify the complexity of these frameworks, we counted the lines of code in reach repo. Megatron has 20,507, lines of code while nanoGPT and Teragpt have 639 and 350 lines of code respectively. This supports our primary claim that TeraGPT trains GPT-3 sized models while retaining the simplicity of nanoGPT.

Megatron-LM

Language	files	blank	comment	code
Python	99	4710	4407	18395
C/C++ Header	4	146	90	1118
C++	4	137	117	649
CUDA	3	41	20	220
HTML	1	15	2	107
Bourne Shell	1	1	0	9
make	1	2	0	7
SUM:	115	5052	4636	20507

nanoGPT

Language	files	blank	comment	code
Python	5	90	187	639
SUM:	5	90	187	639

TeraGPT

Language	files	blank	comment	code
Python	3	109	1	350
SUM:	6	109	1	350

License

Apache

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

Dec 12, 2023

0.0.2

Dec 12, 2023

0.0.1

Dec 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teragpt-0.0.3.tar.gz (8.9 kB view details)

Uploaded Dec 12, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

teragpt-0.0.3-py3-none-any.whl (8.9 kB view details)

Uploaded Dec 12, 2023 Python 3

File details

Details for the file teragpt-0.0.3.tar.gz.

File metadata

Download URL: teragpt-0.0.3.tar.gz
Upload date: Dec 12, 2023
Size: 8.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for teragpt-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`9897198ce681a8546f1986192853db6bb4b63df7115ee0bfaed4b496703ecb7d`
MD5	`755b89d0f0d969e756860e598e1495b1`
BLAKE2b-256	`12557e479a90a14b98ce08b9a140d1a09114aa94f02e6c99266dd878e929fc85`

See more details on using hashes here.

File details

Details for the file teragpt-0.0.3-py3-none-any.whl.

File metadata

Download URL: teragpt-0.0.3-py3-none-any.whl
Upload date: Dec 12, 2023
Size: 8.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for teragpt-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`632994f7a36631de5a93d927b442ac9ebf8b5fd7b787bc557272b9b004bcabc6`
MD5	`53314accd1a4c140e8799a757e1aa782`
BLAKE2b-256	`31780c6a8476e170160931df665963d15ba3b77f8f3c7da561617bf69e3c8838`

See more details on using hashes here.

teragpt 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TeraGPT

Install

Usage

Tokenizer

Train

Environment Variables

How to Train the Model

Note

Codebase comparison

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes