Skip to main content

A general purpose library for training any type of GPT model. Support for gpt-1, gpt-2, and gpt-3 models.

Project description

anyGPT

any-LABS - anyGPT stars - anyGPT forks - anyGPT CI GitHub tag License issues - anyGPT

anyGPT is a general purpose library for training any type of GPT model. Support for gpt-1, gpt-2, and gpt-3 models. Inspired by nanoGPT by Andrej Karpathy, the goal of this project is to provide tools for the training and usage of GPT style large language models. The aim is to provide a tool that is

  • production ready
  • easily configurable
  • scalable
  • free and open-source
  • accessible by general software engineers and enthusiasts
  • easily reproducible and deployable

You don't need a Ph.D. in Machine Learning or Natural Language Processing to use anyGPT.

Installation

NOTE: It is recommended that you set up of a python virtual environment using mamba, conda, or poetry. To install anyGPT:

$ pip install anyGPT

Using Docker

The Docker image supports GPU passthrough for training and inference. In order to enable GPU passthrough please follow the guide for installing the NVidia Container Toolkit for your OS.

NOTE On Windows you need to follow the guide to get NVidia Container Toolkit setup on WSL2. Docker WSL2 Backend is required.

Once NVidia Container Toolkit and Docker is setup correctly, build the Docker image

$ docker build -t anygpt .

Use the following command to login to the container interactively, and use anygpt as if it was on your local host

$ docker run --gpus all -it anygpt

Mounting Volumes

It is recommended to mount a local directory into your container in order to share data between your local host and the container. This will allow you to save trained checkpoints, reuse datasets between runs and more.

$ docker run --gpus all -v /path/to/local/dir:/data -it anygpt

The above example mounts /path/to/local/dir to the /data directory in the container, and all data and changes are shared between them dynamically.

Non interactive Docker

The above documentation explains how to run a Docker container with an interactive session of anyGPT. You can also run anyGPT commands to completion using Docker by overriding the entrypoint

$ docker run --gpus=all -v /path/to/your/data:/data --entrypoint anygpt-run -it anygpt /data/test.pt "hello world"

The above command runs anygpt-run with the parameters /data/test.pt "hello world"

Dependencies

  • torch >= 2.0.0
  • numpy
  • transformers
  • datasets
  • tiktoken
  • wandb
  • tqdm
  • PyYAML
  • lightning
  • tensorboard

Features

Current

  • CLI and config file driven GPT training
  • Supports CPU, GPU, TPU, IPU, and HPU
  • Distributed training strategies for training at scale
  • Easy spin up using Docker
  • FastAPI end-points for containerized microservice deployment
  • HuggingFace integration
    • Load pre-trained gpt models

Roadmap

  • Documentation
  • HuggingFace integration
    • push to hub
  • Easy spin VM spinup/getting started with
    • Downloading of pre-trained models
    • Gradio ChatGPT style interface for testing and experimentation
  • Fine-tuning of pre-trained models
  • Reinforcement Learning from Human Feedback and Rules Base Reward Modeling for LLM alignment
  • More dataformat support beyond hosted text files

Usage

Data Preparation

$ anygpt-prepare-data -n shakespeare_complete -u https://www.gutenberg.org/cache/epub/100/pg100.txt

Training

Create a config file. In this example, I'll call it gpt-2-30M.yaml. You can also check out the example configuration files.

model_config:
  name: 'gpt-2-30M'
  block_size: 256
  dropout: 0.2
  embedding_size: 384
  num_heads: 6
  num_layers: 6

training_config:
  learning_rate: 1.0e-3
  batch_size: 8
  accumulate_gradients: 8
  beta2: 0.99
  min_lr: 1.0e-4
  max_steps: 5000
  val_check_interval: 200
  limit_val_batches: 100

io_config:
  experiment_name: 'gpt-2'
  dataset: 'shakespeare_complete'
$ anygpt-train gpt-2-30M.yaml

Inference

$ anygpt-run results/gpt-2-pretrain/version_0/checkpoints/last.pt \
"JAQUES.
All the world’s a stage,
And all the men and women merely players;
They have their exits and their entrances,
And one man in his time plays many parts,"

Inference Microservice

anyGPT supports running models as a hosted microservice with a singular endpoint for inference. To launch the microservice, use the anygpt-serve entrypoint.

Commandline Options

$ anygpt-serve -h
usage: anyGPT inference service [-h] [--port PORT] [--log-level LOG_LEVEL] model

Loads an anyGPT model and hosts it on a simple microservice that can run inference over the network.

positional arguments:
  model                 Path t0 the trained model checkpoint to load

options:
  -h, --help            show this help message and exit
  --port PORT           Port to start the microservice on (default: 5000)
  --host HOST           Host to bind microservice to (default: 127.0.0.1)
  --log-level LOG_LEVEL
                        uvicorn log level (default: info)

Example

$ anygpt-serve results/gpt-2-pretrain/version_0/checkpoints/last.pt --port 5000 --log-level info

Sending Requests

anygpt-serve uses FastAPI to serve the microservice. To see the available microservice api go to the /docs endpoint in your browser once the microservice is started

Spinning up Microservice in Docker

$ docker run --gpus=all -v /path/to/your/data:/data -p 5000:5000 --entrypoint anygpt-serve -it anygpt /data/test.pt --port 5000 --host 0.0.0.0 --log-level info
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:5000 (Press CTRL+C to quit)

Documentation

view - Documentation

Limitations

TBD

License

The goal of this project is to enable organizations, both large and small, to train and use GPT style Large Language Models. I believe the future is open-source, with people and organizations being able to train from scratch or fine-tune models and deploy to production without relying on gatekeepers. So I'm releasing this under an MIT license for the benefit of all and in the hope that the community will find it useful.

Released under MIT by @any-LABS.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyGPT-0.1.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

anyGPT-0.1.0-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file anyGPT-0.1.0.tar.gz.

File metadata

  • Download URL: anyGPT-0.1.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for anyGPT-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b9f218369cbe5cf007e176b84f02993a6f2e8306603d3b471d4c17b604e8a3df
MD5 8012ab5ad8a24874dec324d1b69376c2
BLAKE2b-256 dd9a2ed68f40e5bfb9f580a6f9febe04daf873cd1af3f0a8a06109ae0203f76f

See more details on using hashes here.

File details

Details for the file anyGPT-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: anyGPT-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for anyGPT-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eea8dd8b52e4093e17f5b3d39b5dbfa5fb587bd03d4efd0e837c44b91c0e7fb8
MD5 10d275ac4c9575252c3434821887a196
BLAKE2b-256 9ea35eee68ec5d80481ebfb7975c0da805258033f23fb7471448232b8dacd6e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page