Skip to main content

A high-performance Vision-Language-Action (VLA) model fine-tuning library for Tesla T4 hardware.

Project description

🚀 FastVLA: Ultra-Efficient VLA Fine-Tuning

FastVLA is a high-performance library designed to bring Vision-Language-Action (VLA) fine-tuning to commodity hardware. By leveraging Triton-accelerated kernels, 4-bit QLoRA, and Unsloth-inspired memory optimizations, FastVLA enables training 7B+ parameter models on a single Tesla T4 (15GB VRAM).

[!IMPORTANT] Goal: Democratize VLA training. If you have a free Google Colab or a T4 instance, you can now fine-tune state-of-the-art robotics models.


📈 Real-World Benchmark: PushT (Tesla T4)

We've verified FastVLA by fine-tuning OpenVLA-7B on the lerobot/pusht_image dataset (Real Robotics Data).

Metric Results on Tesla T4 (15GB) Status
VRAM Usage 5.38 GB (Training Peak) 🚀 Ultra-Light
Throughput 1.42s / step (~0.7 steps/sec) ⚡ Fast
Model Size 7.3 Billion Parameters (4-bit) 🧠 Full Scale
Learning Signal 19.6 → 1.15 Loss (in 400 steps) ✅ Verified

Key Takeaway: Using 4-bit QLoRA and our custom Triton Action Head, you can fine-tune a 7B VLA while leaving ~10GB of VRAM free for other processes.


✨ Features

  • 4-bit QLoRA: Reduces 7B model memory from 28GB to 4.3GB with near-zero quality loss.
  • Triton Action Head: Fused Linear → ReLU → Linear → Tanh kernel with Gradient Checkpointing to save activation memory.
  • VLA Model Registry: One-line loading for any VLA — FastVLA.from_pretrained("openvla-7b")
    • Pre-registered: OpenVLA-7B, SmolVLA (135M), π₀-Base (2B)
    • Register custom models: any vision encoder + any LLM + any action head
  • Corrected VLA Objective: Proper discretized action prediction (256 bins) matching OpenVLA.
  • Robotics-First: Built-in support for PushT and LIBERO datasets.

🛠️ Installation

FastVLA uses uv for lightning-fast dependency management.

# Clone the repo
git clone https://github.com/BouajilaHamza/FastVLA.git
cd FastVLA

# Install dependencies using uv
uv sync

📖 Quick Start

Fine-Tune on PushT (Real Robotics)

Run our optimized PushT script. It automatically handles image normalization and action discretization.

uv run scripts/finetune_pusht.py --steps 2000 --batch 1 --lr 1e-4

High-Level API

from fastvla import FastVLAModel

# Load any VLA with 4-bit QLoRA optimization
model = FastVLAModel.from_pretrained(
    vision_encoder_name="google/vit-base-patch16-224",
    llm_name="meta-llama/Llama-2-7b-hf",
    load_in_4bit=True,
    use_peft=True,
    gradient_checkpointing=True,
)

# Inference (predict continuous actions)
action = model.generate(images=image_tensor, input_ids=text_ids)

🏗️ Project Structure

  • fastvla/: Core library containing the model architecture and Triton kernels.
    • kernels/: Fused Triton kernels for Action Heads and Fusion.
  • scripts/: Production-ready fine-tuning and benchmarking scripts.
    • finetune_pusht.py: The recommended script for PushT fine-tuning.
    • finetune_libero.py: Configuration for the LIBERO simulation benchmark.
  • results/: Standardized output for training logs and benchmark JSONs.
  • tests/: Comprehensive test suite for numerical parity and kernel stability.

🧪 Testing & Validation

We enforce strict Numerical Parity between our Triton kernels and PyTorch benchmarks.

# Run all tests
uv run pytest tests/ -v

# Run GPU kernel benchmarks
uv run python scripts/benchmark_gpu.py

🤝 Contributing & Roadmap

  • Unsloth v2 Integration: Direct patching for vision encoders.
  • FlashAttention-3: Support for latest Hopper/Ada kernels.
  • Multi-Camera Fusing: Optimized packing for 3+ camera setups.

📜 License

MIT License. Created by the FastVLA Research Team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastvla-0.1.0.tar.gz (51.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastvla-0.1.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file fastvla-0.1.0.tar.gz.

File metadata

  • Download URL: fastvla-0.1.0.tar.gz
  • Upload date:
  • Size: 51.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for fastvla-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1345194d5a066c5c0d1c536a3adcad67216541e1fe947f4470fc47a113f36c87
MD5 6149bd9173720a458699824a04cb6c6a
BLAKE2b-256 ceba96de3b7047cc75873fdf9e9b84abefe2b980dd6a0ae39af9ad1ab200e2c2

See more details on using hashes here.

File details

Details for the file fastvla-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fastvla-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for fastvla-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f0c961efc398aaac1cff297029062bff75ae0adf8de33ae93696491924a1ec6
MD5 d05d4e5054c44348d346db2509916fe6
BLAKE2b-256 06a5b05164192e6ad01edd2bc7771c8d12993caa214eb03a357bf0f0847a87a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page