A high-performance Vision-Language-Action (VLA) model fine-tuning library optimized for NVIDIA L4 and T4 hardware.
Project description
FASTVLA
I trained a 7B-parameter Robot to understand Arabic for $0.48/hr. Stop renting H100s.
🌍 The Gap: Arabic Physical AI
In 2026, 81% of Arabic AI research is still just text. Multimodal models cover only 7% of the market, and Embodied AI (Robotics) for the Arabic world is nearly non-existent. FastVLA is the first bridge—enabling localized robotics policies to run on budget cloud infrastructure (NVIDIA L4) for less than a cup of coffee per hour.
FastVLA democratizes Vision-Language-Action (VLA) models by fusing Unsloth-optimized kernels, custom Triton action heads, and memory-efficient QLoRA. Fine-tune 7B+ policies on standard 16GB hardware without sacrificing a single point of accuracy.
📊 PERFORMANCE & ACCURACY (ARABIC HERO / NVIDIA L4)
FastVLA preserves full model accuracy while delivering massive speedups. Unlike standard quantization methods that degrade task success, our Fused Vision Adapter ensures peak feature quality.
| Metric | OpenVLA (Base) | FastVLA (Fine) | Improvement |
|---|---|---|---|
| Inference Latency | 1420.0 ms | 198.2 ms | 7.16x faster |
| Peak VRAM Usage | 5.50 GB | 4.45 GB | 19.2% reduction |
| Action Error (L2) | 28.5 px | 12.4 px | 2.30x more accurate |
| Training Time/Step | ~14,000 ms | ~3,800 ms | 3.68x faster |
🚀 Real-time Ready: By dropping latency from ~1.4s to under 200ms, FastVLA enables 5Hz control loops on budget L4 GPUs. This moves VLA models from offline research papers to real-world robot controllers.
⚡ CORE FEATURES
- [V] SURGICAL VISION EXTRACTION: Intelligent loading that extracts raw vision encoders from complex wrappers, ensuring peak visual feature quality.
- [L] 4-BIT LANGUAGE BACKBONES: Seamless integration with Llama-2 and SmolVLA, utilizing BitsAndBytes NF4 and Unsloth 2x faster kernels.
- [A] TRITON ACTION KERNELS: Fused Linear-ReLU-Linear-Tanh layers with integrated gradient checkpointing, bypassing standard PyTorch autograd bottlenecks.
- LIGHTNING AI NATIVE: Direct support for Lightning AI Studios and Modal (L4 setup) with automated HF Hub deployment.
📥 INSTALLATION
1. Requirements
FastVLA requires Python 3.10+ and PyTorch 2.4+.
2. Hardware Compatibility
FastVLA is designed to be highly versatile across budget cloud hardware:
- NVIDIA L4 (Recommended): Primary target for latest production runs, fine-tuning, and translation. Performance benchmarks above are measured on L4.
- NVIDIA T4 / 2x T4: The original development and testing bed. Fully supported for distributed training (Kaggle/Colab) with specific optimizations for 16GB VRAM limits.
- Lightning AI / Modal: Native support for L4/T4 instances.
3. Using uv (Recommended)
git clone https://github.com/BouajilaHamza/fastvla.git
cd fastvla
uv sync
🚀 QUICKSTART
Loading a Quantized VLA
FastVLA integrates with the Transformers ecosystem to load models with PEFT adapters and BitsAndBytes 4-bit quantization.
from fastvla import FastVLAModel
# Load OpenVLA-7B with 4-bit quantization and LoRA
model = FastVLAModel.from_pretrained(
"openvla-7b",
load_in_4bit=True,
use_peft=True
)
Training on Modal
Launch a distributed training job on an L4 GPU with a single command:
modal run scripts/modal_arabic_pipeline.py
Deployment
One-line saving to the Hugging Face Hub, preserving all adapters and VLA projection layers.
model.push_to_hub("hamzabouajila/fastvla-arabic-hero", token="your_hf_token")
🧪 RELIABILITY
- 100% TEST PASS RATE: Verified across full unit test suite.
- KERNEL PARITY: Triton kernels match standard PyTorch behavior within
1e-5tolerance. - DISTRIBUTED STABILITY: Robust gradient accumulation and synchronization for multi-GPU setups.
📜 LICENSE & CITATION
FastVLA is released under the Apache-2.0 License.
@software{fastvla2026,
author = {Bouajila Hamza and FastVLA Team},
title = {FastVLA: High-Performance VLA Fine-Tuning},
url = {https://github.com/BouajilaHamza/fastvla},
year = {2026}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastvla-0.2.0.tar.gz.
File metadata
- Download URL: fastvla-0.2.0.tar.gz
- Upload date:
- Size: 132.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9387a6c72c92a33f5a989ead87647e532c69cb001e6c8c36b86bafb07b3b514c
|
|
| MD5 |
4d9fef18f7066729daebcaba5e32a741
|
|
| BLAKE2b-256 |
6ef8fab93c8788e8221c7e34d8e1d387728f60c3afa24f7691f41b79c8f25229
|
File details
Details for the file fastvla-0.2.0-py3-none-any.whl.
File metadata
- Download URL: fastvla-0.2.0-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e00a276efc1b3b8820fe925799fd337f743d2d597ea92bb507971f0ca2dddf55
|
|
| MD5 |
777d3b662dcfa3f82734e5d84004714b
|
|
| BLAKE2b-256 |
aa9938342fddc387523dd749a942151ded018d84d420edfed6d175614893b0cb
|