Flex AI client library
Project description
Lightweight Library to Finetune and Deploy All LLMs, no CUDA, no NVIDIA drivers, no OOMs, Multi-GPUs setup, No Prompt Templates !
FlexAI
A platform that simplifies fine-tuning and inference for 60+ open-source LLMs through a single API interface. FlexAI enables serverless deployment, reducing setup time by up to 70%. Finally , You dont have to handle installations, OOMs, GPUs setup, prompt templates, integrating new models, wait too long to download huge models, etc.
⭐ Key Features
- Serverless fine-tuning and inference
- Live time and cost estimations
- Checkpoint management
- LoRA and multi-LoRA support
- Target inference validations
- OpenAI-compatible Endpoints API
- Interactive Playground
✨ Get Started
- Sign up at app.getflex.ai, New accounts come with 5$ for free,to get started :)
- Get your API key from Settings -> API Keys
- Start with our documentation
- Everything can be done without any code from our dashboard - FlexAI Dashboard
📚 Full Google Colab Example
One Notebook to fine tune all LLMs
💾 Installation
You dont need to install, no CUDA, no NVIDIA drivers, no setup. Our lightweight library is only an API wrapper to FlexAI serverless GPUs. You can work from any operating system, including Windows, MacOS, and Linux.
pip install flex_ai openai
🦥 Quick Start
from flex_ai import FlexAI
# Initialize client with your API key
client = FlexAI(api_key="your-api-key")
# Create dataset - for all datasets [here](https://docs.getflex.ai/quickstart#upload-your-first-dataset)
dataset = client.create_dataset("Dataset Name", "train.jsonl", "eval.jsonl")
# Start fine-tuning -
task = client.create_finetune(
name="My Task",
dataset_id=dataset["id"],
# You can choose from 60+ models, Full list [here](https://docs.getflex.ai/core-concepts/models)
model="meta-llama/Llama-3.2-3B-Instruct",
n_epochs=10,
train_with_lora=True,
lora_config={
"lora_r": 64,
"lora_alpha": 8,
"lora_dropout": 0.1
}
)
# Create endpoint
endpoint = client.create_multi_lora_endpoint(
name="My Endpoint",
lora_checkpoints=[{"id": checkpoint_id, "name": "step_1"}],
compute="A100-40GB"
)
🥇 Using Your Fine-tuned Model
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url=f"{endpoint_url}/v1"
)
completion = client.completions.create(
model="your-model",
prompt="Your prompt",
max_tokens=60
)
🔗 Links and Resources
Type | Links |
---|---|
📚 Documentation & Wiki | Read Our Docs |
Twitter (aka X) | Follow us on X |
💾 Installation | getflex/README.md |
🌐 Supported Models | FlexAI Models |
🦥 Full Example
from flex_ai import FlexAI
from openai import OpenAI
import time
# Initialize the Flex AI client
client = FlexAI(api_key="your_api_key_here")
# Create dataset - for all datasets [here](https://docs.getflex.ai/quickstart#upload-your-first-dataset)
dataset = client.create_dataset(
"API Dataset New",
"instruction/train.jsonl",
"instruction/eval.jsonl"
)
# Start a fine-tuning task
task = client.create_finetune(
name="My Task New",
dataset_id=dataset["id"],
model="meta-llama/Llama-3.2-1B-Instruct",
n_epochs=5,
train_with_lora=True,
lora_config={
"lora_r": 64,
"lora_alpha": 8,
"lora_dropout": 0.1
},
n_checkpoints_and_evaluations_per_epoch=1,
batch_size=4,
learning_rate=0.0001,
save_only_best_checkpoint=True
)
# Wait for training completion
client.wait_for_task_completion(task_id=task["id"])
# Wait for last checkpoint to be uploaded
while True:
checkpoints = client.get_task_checkpoints(task_id=task["id"])
if checkpoints and checkpoints[-1]["stage"] == "FINISHED":
last_checkpoint = checkpoints[-1]
checkpoint_list = [{
"id": last_checkpoint["id"],
"name": "step_" + str(last_checkpoint["step"])
}]
break
time.sleep(10) # Wait 10 seconds before checking again
# Create endpoint
endpoint_id = client.create_multi_lora_endpoint(
name="My Endpoint New",
lora_checkpoints=checkpoints_list,
compute="A100-40GB"
)
endpoint = client.wait_for_endpoint_ready(endpoint_id=endpoint_id)
# Use the model
openai_client = OpenAI(
api_key="your_api_key_here",
base_url=f"{endpoint['url']}/v1"
)
completion = openai_client.completions.create(
model="meta-llama/Llama-3.2-1B-Instruct",
prompt="Translate the following English text to French",
max_tokens=60
)
print(completion.choices[0].text)
LLM Models Available for Fine-tuning
This table provides an overview of the Large Language Models (LLMs) available for fine-tuning, ordered approximately from most well-known to least familiar. It lists key details for each model, including its name, family, parameter count, context length, and additional features.
Model Name | Family | Parameters (B) | Context Length | vLLM Support | LoRA Support |
---|---|---|---|---|---|
Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.2-3B-Instruct | llama3.2 | 3 | 131,072 | Yes | Yes |
Meta-Llama-3.2-1B-Instruct | llama3.2 | 1 | 131,072 | Yes | Yes |
Mistral-Small-Instruct-2409 | mistral | 7.2 | 128,000 | Yes | Yes |
Ministral-8B-Instruct-2410 | mistral | 8 | 128,000 | Yes | Yes |
Mathstral-7B-v0.1 | mistral | 7 | 32,000 | Yes | Yes |
Qwen2.5-Coder-7B-Instruct | qwen2.5 | 7 | 32,768 | Yes | Yes |
Aya-Expanse-32b | aya | 32 | 128,000 | Yes | No |
Aya-Expanse-8b | aya | 8 | 8,000 | Yes | No |
Nemotron-Mini-4B-Instruct | nemotron | 4 | 4,096 | Yes | No |
Gemma-2-2b-it | gemma2 | 2 | 8,192 | Yes | Yes |
Meta-Llama-3.1-70B-Instruct | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-70B-Instruct | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-70B | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-8B-Instruct | llama3.1 | 8 | 131,072 | Yes | Yes |
Meta-Llama-3.1-8B | llama3.1 | 8 | 131,072 | Yes | Yes |
Meta-Llama-3-70B-Instruct | llama3 | 70 | 8,192 | Yes | Yes |
Meta-Llama-3-70B | llama3 | 70 | 8,192 | Yes | Yes |
Meta-Llama-3-8B-Instruct | llama3 | 8 | 8,192 | Yes | Yes |
Meta-Llama-3-8B | llama3 | 8 | 8,192 | Yes | Yes |
Mixtral-8x7B-Instruct-v0.1 | mixtral | 46.7 | 32,768 | Yes | Yes |
Mistral-7B-Instruct-v0.3 | mistral | 7.2 | 32,768 | Yes | Yes |
Mistral-Nemo-Instruct-2407 | mistral | 12.2 | 128,000 | No | No |
Mistral-Nemo-Base-2407 | mistral | 12.2 | 128,000 | No | No |
Gemma-2-27b-it | gemma2 | 27 | 8,192 | Yes | Yes |
Gemma-2-27b | gemma2 | 27 | 8,192 | Yes | Yes |
Gemma-2-9b-it | gemma2 | 9 | 8,192 | Yes | Yes |
Gemma-2-9b | gemma2 | 9 | 8,192 | Yes | Yes |
Phi-3-medium-128k-instruct | phi3 | 14 | 128,000 | Yes | No |
Phi-3-medium-4k-instruct | phi3 | 14 | 4,000 | Yes | No |
Phi-3-small-128k-instruct | phi3 | 7.4 | 128,000 | Yes | No |
Phi-3-small-8k-instruct | phi3 | 7.4 | 8,000 | Yes | No |
Phi-3-mini-128k-instruct | phi3 | 3.8 | 128,000 | Yes | No |
Phi-3-mini-4k-instruct | phi3 | 3.8 | 4,096 | Yes | No |
Qwen2-72B-Instruct | qwen2 | 72 | 32,768 | Yes | Yes |
Qwen2-72B | qwen2 | 72 | 32,768 | Yes | Yes |
Qwen2-57B-A14B-Instruct | qwen2 | 57 | 32,768 | Yes | Yes |
Qwen2-57B-A14B | qwen2 | 57 | 32,768 | Yes | Yes |
Qwen2-7B-Instruct | qwen2 | 7 | 32,768 | Yes | Yes |
Qwen2-7B | qwen2 | 7 | 32,768 | Yes | Yes |
Qwen2-1.5B-Instruct | qwen2 | 1.5 | 32,768 | Yes | Yes |
Qwen2-1.5B | qwen2 | 1.5 | 32,768 | Yes | Yes |
Qwen2-0.5B-Instruct | qwen2 | 0.5 | 32,768 | Yes | Yes |
Qwen2-0.5B | qwen2 | 0.5 | 32,768 | Yes | Yes |
TinyLlama_v1.1 | tinyllama | 1.1 | 2,048 | No | No |
DeepSeek-Coder-V2-Lite-Base | deepseek-coder-v2 | 16 | 163,840 | No | No |
InternLM2_5-7B-Chat | internlm2.5 | 7.74 | 1,000,000 | Yes | No |
InternLM2_5-7B | internlm2.5 | 7.74 | 1,000,000 | Yes | No |
Jamba-v0.1 | jamba | 51.6 | 256,000 | Yes | Yes |
Yi-1.5-34B-Chat | yi-1.5 | 34.4 | 4,000 | Yes | Yes |
Yi-1.5-34B | yi-1.5 | 34.4 | 4,000 | Yes | Yes |
Yi-1.5-34B-32K | yi-1.5 | 34.4 | 32,000 | Yes | Yes |
Yi-1.5-34B-Chat-16K | yi-1.5 | 34.4 | 16,000 | Yes | Yes |
Yi-1.5-9B-Chat | yi-1.5 | 8.83 | 4,000 | Yes | Yes |
Yi-1.5-9B | yi-1.5 | 8.83 | 4,000 | Yes | Yes |
Yi-1.5-9B-32K | yi-1.5 | 8.83 | 32,000 | Yes | Yes |
Yi-1.5-9B-Chat-16K | yi-1.5 | 8.83 | 16,000 | Yes | Yes |
Yi-1.5-6B-Chat | yi-1.5 | 6 | 4,000 | Yes | Yes |
Yi-1.5-6B | yi-1.5 | 6 | 4,000 | Yes | Yes |
c4ai-command-r-v01 | command-r | 35 | 131,072 | Yes | No |
Notes:
- "vLLM Support" indicates whether the model is compatible with the vLLM (very Large Language Model) inference framework.
- "LoRA Support" indicates if the vLLM support inference the model with multiple LorA Adapters. Read more
- Context length is measured in tokens. (The model context can change by the target inference library)
- Parameter count is shown in billions (B).
- Links lead to the model's page on Hugging Face or the official website when available.
This table provides a comprehensive overview of the available models, their sizes, capabilities, and support for various fine-tuning techniques. When choosing a model for fine-tuning, consider factors such as the model size, context length, and support for specific optimization techniques like vLLM and LoRA.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flex_ai-0.42.tar.gz
.
File metadata
- Download URL: flex_ai-0.42.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b27c8e40c99a65a4aad6b45315ff7ce635592d00102476dadca826ead1a890f |
|
MD5 | 6b1a82ff01c436f5521d6c2e5642f1ba |
|
BLAKE2b-256 | b92a63a6a41b8275baee45a67ae70ef0bfabcb2f2203b3b5b7f32ac690be6bd2 |
File details
Details for the file flex_ai-0.42-py3-none-any.whl
.
File metadata
- Download URL: flex_ai-0.42-py3-none-any.whl
- Upload date:
- Size: 21.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce5f06611683e87e581e1e0b2891c58575322b6329c2e4377f23f60891beec71 |
|
MD5 | e15ac9fad1766b169494865a3eaade6c |
|
BLAKE2b-256 | de0bb6a9ab6d510f86db9a8b3912a40f1882dd8ce31128b8a3881c324f9ba0c2 |