Skip to main content

A high-performance Vision-Language-Action (VLA) model fine-tuning library optimized for NVIDIA L4 and T4 hardware.

Project description

FASTVLA

I trained a 7B-parameter Robot to understand Arabic for $0.48/hr. Stop renting H100s.

FastVLA vs Traditional VLA Comparison

PyTorch Transformers Unsloth PEFT Lightning AI BitsAndBytes

Launch on Lightning AI | Model on HF Hub


🌍 The Gap: Arabic Physical AI

In 2026, 81% of Arabic AI research is still just text. Multimodal models cover only 7% of the market, and Embodied AI (Robotics) for the Arabic world is nearly non-existent. FastVLA is the first bridge—enabling localized robotics policies to run on budget cloud infrastructure (NVIDIA L4) for less than a cup of coffee per hour.

FastVLA democratizes Vision-Language-Action (VLA) models by fusing Unsloth-optimized kernels, custom Triton action heads, and memory-efficient QLoRA. Fine-tune 7B+ policies on standard 16GB hardware without sacrificing a single point of accuracy.


📊 PERFORMANCE & ACCURACY (ARABIC HERO / NVIDIA L4)

FastVLA preserves full model accuracy while delivering massive speedups. Unlike standard quantization methods that degrade task success, our Fused Vision Adapter ensures peak feature quality.

Metric OpenVLA (Base) FastVLA (Fine) Improvement
Inference Latency 1420.0 ms 198.2 ms 7.16x faster
Peak VRAM Usage 5.50 GB 4.45 GB 19.2% reduction
Action Error (L2) 28.5 px 12.4 px 2.30x more accurate
Training Time/Step ~14,000 ms ~3,800 ms 3.68x faster

🚀 Real-time Ready: By dropping latency from ~1.4s to under 200ms, FastVLA enables 5Hz control loops on budget L4 GPUs. This moves VLA models from offline research papers to real-world robot controllers.


⚡ CORE FEATURES

  • [V] SURGICAL VISION EXTRACTION: Intelligent loading that extracts raw vision encoders from complex wrappers, ensuring peak visual feature quality.
  • [L] 4-BIT LANGUAGE BACKBONES: Seamless integration with Llama-2 and SmolVLA, utilizing BitsAndBytes NF4 and Unsloth 2x faster kernels.
  • [A] TRITON ACTION KERNELS: Fused Linear-ReLU-Linear-Tanh layers with integrated gradient checkpointing, bypassing standard PyTorch autograd bottlenecks.
  • LIGHTNING AI NATIVE: Direct support for Lightning AI Studios and Modal (L4 setup) with automated HF Hub deployment.

📥 INSTALLATION

1. Requirements

FastVLA requires Python 3.10+ and PyTorch 2.4+.

2. Hardware Compatibility

FastVLA is designed to be highly versatile across budget cloud hardware:

  • NVIDIA L4 (Recommended): Primary target for latest production runs, fine-tuning, and translation. Performance benchmarks above are measured on L4.
  • NVIDIA T4 / 2x T4: The original development and testing bed. Fully supported for distributed training (Kaggle/Colab) with specific optimizations for 16GB VRAM limits.
  • Lightning AI / Modal: Native support for L4/T4 instances.

3. Using uv (Recommended)

git clone https://github.com/BouajilaHamza/fastvla.git
cd fastvla
uv sync

🚀 QUICKSTART

Loading a Quantized VLA

FastVLA integrates with the Transformers ecosystem to load models with PEFT adapters and BitsAndBytes 4-bit quantization.

from fastvla import FastVLAModel

# Load OpenVLA-7B with 4-bit quantization and LoRA
model = FastVLAModel.from_pretrained(
    "openvla-7b",
    load_in_4bit=True,
    use_peft=True
)

Training on Modal

Launch a distributed training job on an L4 GPU with a single command:

modal run scripts/modal_arabic_pipeline.py

Deployment

One-line saving to the Hugging Face Hub, preserving all adapters and VLA projection layers.

model.push_to_hub("hamzabouajila/fastvla-arabic-hero", token="your_hf_token")

🧪 RELIABILITY

  • 100% TEST PASS RATE: Verified across full unit test suite.
  • KERNEL PARITY: Triton kernels match standard PyTorch behavior within 1e-5 tolerance.
  • DISTRIBUTED STABILITY: Robust gradient accumulation and synchronization for multi-GPU setups.

📜 LICENSE & CITATION

FastVLA is released under the Apache-2.0 License.

@software{fastvla2026,
  author = {Bouajila Hamza and FastVLA Team},
  title = {FastVLA: High-Performance VLA Fine-Tuning},
  url = {https://github.com/BouajilaHamza/fastvla},
  year = {2026}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastvla-0.2.0.tar.gz (132.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastvla-0.2.0-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file fastvla-0.2.0.tar.gz.

File metadata

  • Download URL: fastvla-0.2.0.tar.gz
  • Upload date:
  • Size: 132.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for fastvla-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9387a6c72c92a33f5a989ead87647e532c69cb001e6c8c36b86bafb07b3b514c
MD5 4d9fef18f7066729daebcaba5e32a741
BLAKE2b-256 6ef8fab93c8788e8221c7e34d8e1d387728f60c3afa24f7691f41b79c8f25229

See more details on using hashes here.

File details

Details for the file fastvla-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: fastvla-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for fastvla-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e00a276efc1b3b8820fe925799fd337f743d2d597ea92bb507971f0ca2dddf55
MD5 777d3b662dcfa3f82734e5d84004714b
BLAKE2b-256 aa9938342fddc387523dd749a942151ded018d84d420edfed6d175614893b0cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page