felafax

These details have not been verified by PyPI

Project links

Project description

Felafax -- tune LLaMa3.1 on Google Cloud TPUs for 30% lower cost and scale seamlessly!

Felafax is a framework for continued-training and fine-tuning open source LLMs using XLA runtime. We take care of necessary runtime setup and provide a Jupyter notebook out-of-box to just get started.

Easy to use.
Easy to configure all aspects of training (designed for ML researchers and hackers).
Easy to scale training from a single TPU VM with 8 cores to entire TPU Pod containing 6000 TPU cores (1000X)!

✨ Finetune for Free

Add your dataset, click "Run All", and you'll run on free TPU resource on Google Colab!

Felafax supports	Free Notebooks
Llama 3.1 (8B)	▶️ Start for free on Google Colab TPU

Goal

Our goal at felafax is to build infra to make it easier to run AI workloads on non-NVIDIA hardware (TPU, AWS Trainium, AMD GPUs, and Intel GPUs).

Currently supported models

LLaMa-3.1 JAX Implementation $${\color{red}New!}$$
- Converted from PyTorch to JAX for improved performance
- On TPU v4, v5, runs 2-way data parallel and 2-way model parallel training (2 data parallel model copies and each model copy is sharded across two TPU chips).
- On TPU v2, v3, runs 1 model copy sharded across 8 cores.
- Full-precision and LoRA training support
LLaMa-3/3.1 PyTorch XLA
- LoRA and full-precision training support
- codepointer

Setup

For a hosted version with a seamless workflow, please request access here. 🦊.

If you prefer a self-hosted training version, follow the instructions below. These steps will guide you through launching a TPU VM on your Google Cloud account and starting a Jupyter notebook. With just 3 simple steps, you'll be up and running in under 10 minutes. 🚀

Install gcloud command-line tool and authenticate your account (SKIP this STEP if you already have gcloud installed and have used TPUs before! 😎)

 # Download gcloud CLI
 curl https://sdk.cloud.google.com | bash
 source ~/.bashrc

 # Authenticate gcloud CLI
 gcloud auth login

 # Create a new project for now
 gcloud projects create LLaMa3-tunerX --set-as-default

 # Config SSH and add
 gcloud compute config-ssh --quiet

 # Set up default credentials
 gcloud auth application-default login

 # Enable Cloud TPU API access
 gcloud services enable compute.googleapis.com tpu.googleapis.com storage-component.googleapis.com aiplatform.googleapis.com

Spin up a TPU v5-8 VM 🤠.
```
sh ./launch_tuner.sh
```
Keep an eye on the terminal -- you might be asked to input SSH key password and need to put in your HuggingFace token.

Clone the repo and install dependencies

git clone https://github.com/felafax/felafax.git
cd felafax
pip install -r requirements.txt

Open the Jupyter notebook at https://localhost:888 and start fine-tuning!

AMD 405B fine-tuning run:

We recently fine-tuned the llama3.1 405B model on 8xAMD MI300x GPUs using JAX instead of PyTorch. JAX's advanced sharding APIs allowed us to achieve great performance. Check out our blog post to learn about the setup and the sharding tricks we used.

We did LoRA fine-tuning with all model weights and lora parameters in bfloat16 precision, and with LoRA rank of 8 and LoRA alpha of 16:

Model Size: The LLaMA model weights occupy around 800GB of VRAM.
LoRA Weights + Optimizer State: Approximately 400GB of VRAM.
Total VRAM Usage: 77% of the total VRAM, around 1200GB.
Constraints: Due to the large size of the 405B model, there was limited space for batch size and sequence length. The batch size used was 16 and the sequence length was 64.
Training Speed: ~35 tokens/second
Memory Efficiency: Consistently around 70%
Scaling: With JAX, scaling was near-linear across 8 GPUs.

The GPU utilization and VRAM utilization graphs can be found below. However, we still need to calculate the Model FLOPs Utilization (MFU). Note: We couldn't run the JIT-compiled version of the 405B model due to infrastructure and VRAM constraints (we need to investigate this further). The entire training run was executed in JAX eager mode, so there is significant potential for performance improvements.

GPU utilization:
VRAM utilization:
rocm-smi data can be found here.

Credits:

Google Deepmind's Gemma repo.
EasyLM and EleutherAI for great work on llama models in JAX
PyTorch XLA FSDP and SPMD testing done by HeegyuKim.
Examples from PyTorch-XLA repo.

Contact

If you have any questions, please contact us at founders@felafax.ai.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.12

Nov 24, 2024

This version

1.0.11

Nov 24, 2024

1.0.10

Nov 18, 2024

1.0.9

Sep 17, 2024

1.0.8

Sep 10, 2024

1.0.7

Sep 5, 2024

1.0.6

Sep 4, 2024

1.0.5

Sep 3, 2024

1.0.4

Sep 3, 2024

1.0.3

Sep 3, 2024

1.0.2

Sep 3, 2024

1.0.1

Sep 1, 2024

1.0.0

Sep 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

felafax-1.0.11.tar.gz (25.5 kB view details)

Uploaded Nov 24, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

felafax-1.0.11-py3-none-any.whl (25.6 kB view details)

Uploaded Nov 24, 2024 Python 3

File details

Details for the file felafax-1.0.11.tar.gz.

File metadata

Download URL: felafax-1.0.11.tar.gz
Upload date: Nov 24, 2024
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.9 Darwin/24.1.0

File hashes

Hashes for felafax-1.0.11.tar.gz
Algorithm	Hash digest
SHA256	`dee243bd14ace346ce592654de4002b2893d5ac0f1520a81281a33cbddc1693e`
MD5	`bab27c30952da38fd941bbb9efb311fa`
BLAKE2b-256	`d952805d62bd49ba02ba78a24aa4d4702cb87394a29e51a63b19588ae9a59aba`

See more details on using hashes here.

File details

Details for the file felafax-1.0.11-py3-none-any.whl.

File metadata

Download URL: felafax-1.0.11-py3-none-any.whl
Upload date: Nov 24, 2024
Size: 25.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.9 Darwin/24.1.0

File hashes

Hashes for felafax-1.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b25230bcc12aa77ce78e198fe29f9f62a6d12e2b2c99b9f63dbc9d115cb9dfbc`
MD5	`9aff040fa159eafaa3afb88c649f4ff6`
BLAKE2b-256	`5b098b3ae0fbc06b65a9bb5722d51da40e7c8df54eeefb5f24651566b1c344b5`

See more details on using hashes here.

felafax 1.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Felafax -- tune LLaMa3.1 on Google Cloud TPUs for 30% lower cost and scale seamlessly!

✨ Finetune for Free

Goal

Currently supported models

Setup

AMD 405B fine-tuning run:

Credits:

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes