Skip to main content

OpenEMMA is an open-source implementation of Waymo's End-to-End Multimodal Model for Autonomous Driving (EMMA).

Project description

OpenEMMA

English | 中文 | 日本語

Code License arXiv

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

OpenEMMA is an open-source implementation of Waymo's End-to-End Multimodal Model for Autonomous Driving (EMMA), offering an end-to-end framework for motion palnning in autonomous vehicles. OpenEMMA leverages the pretrained world knowledge of Vision Language Models (VLMs), such as GPT-4 and LLaVA, to integrate text and front-view camera inputs, enabling precise predictions of future ego waypoints and providing decision rationales. Our goal is to provide accessible tools for researchers and developers to advance autonomous driving research and applications.

EMMA diagram

Figure 1. EMMA: Waymo's End-to-End Multimodal Model for Autonomous Driving.

OpenEMMA diagram

Figure 2. OpenEMMA: Ours Open-Source End-to-End Autonomous Driving Framework based on Pre-trained VLMs.

News

  • [2024/12/19] 🔥We released OpenEMMA, an open-source project for end-to-end motion planning autonomous driving tasks. Explore our paper for more details.

Table of Contents

Demos

Installation

To get started with OpenEMMA, follow these steps to set up your environment and dependencies.

  1. Environment Setup
    Set up a Conda environment for OpenEMMA with Python 3.8:

    conda create -n openemma python=3.8
    conda activate openemma
    
  2. Clone OpenEMMA Repository
    Clone the OpenEMMA repository and navigate to the root directory:

    git clone git@github.com:taco-group/OpenEMMA.git
    cd OpenEMMA
    
  3. Install Dependencies
    Ensure you have cudatoolkit installed. If not, use the following command:

    conda install nvidia/label/cuda-12.4.0::cuda-toolkit
    

    To install the core packages required for OpenEMMA, run the following command:

    pip install -r requirements.txt
    

    This will install all dependencies, including those for YOLO-3D, an external tool used for critical object detection. The weights needed to run YOLO-3D will be automatically downloaded during the first execution.

  4. Set up GPT-4 API Access
    To enable GPT-4’s reasoning capabilities, obtain an API key from OpenAI. You can add your API key directly in the code where prompted or set it up as an environment variable:

    export OPENAI_API_KEY="your_openai_api_key"
    

    This allows OpenEMMA to access GPT-4 for generating future waypoints and decision rationales.

Usage

After setting up the environment, you can start using OpenEMMA with the following instructions:

  1. Prepare Input Data
    Download and extract the nuScenes dataset

  2. Run OpenEMMA
    Use the following command to execute OpenEMMA's main script:

    python main.py \
        --model-path qwen \
        --dataroot [dir-of-nuscnse-dataset] \
        --version [vesion-of-nuscnse-dataset] \
        --method openemma
    

    Currently, we support the following models: GPT-4o, LLaVA-1.6-Mistral-7B, Llama-3.2-11B-Vision-Instruct, and Qwen2-VL-7B-Instruct. To use a specific model, simply pass gpt, llava, llama, and qwenas the argument to --model-path.

  3. Output Interpretation
    After running the model, OpenEMMA generates the following output in the ./qwen-reults location:

    • Waypoints: A list of future waypoints predicting the ego vehicle’s trajectory.

    • Decision Rationales: Text explanations of the model’s reasoning, including scene context, critical objects, and behavior decisions.

    • Annotated Images: Visualizations of the planned trajectory and detected critical objects overlaid on the original images.

    • Compiled Video: A video (e.g., output_video.mp4) created from the annotated images, showing the predicted path over time.

Contact

For help or issues using this package, please submit a GitHub issue.

For personal communication related to this project, please contact Shuo Xing (shuoxing@tamu.edu).

Citation

We are more than happy if this code is helpful to your work. If you use our code or extend our work, please consider citing our paper:

@article{openemma,
	author = {Xing, Shuo and Qian, Chengyuan and Wang, Yuping and Hua, Hongyuan and Tian, Kexin and Zhou, Yang and Tu, Zhengzhong},
	title = {OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving},
	journal = {arXiv},
	year = {2024},
	month = dec,
	eprint = {2412.15208},
	doi = {10.48550/arXiv.2412.15208}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openemma-0.1.20.tar.gz (167.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openemma-0.1.20-py3-none-any.whl (204.5 kB view details)

Uploaded Python 3

File details

Details for the file openemma-0.1.20.tar.gz.

File metadata

  • Download URL: openemma-0.1.20.tar.gz
  • Upload date:
  • Size: 167.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for openemma-0.1.20.tar.gz
Algorithm Hash digest
SHA256 528e86df61460d1429e3bbf188e4ce8ca30ba7e34fd89ba26c867f175277b63e
MD5 e843dfdd54a84976fab8ac82cccb2329
BLAKE2b-256 a8073dca4ce0c406c85f09d56d4d6a40865ab7294e9b6ce82412e602f3794691

See more details on using hashes here.

File details

Details for the file openemma-0.1.20-py3-none-any.whl.

File metadata

  • Download URL: openemma-0.1.20-py3-none-any.whl
  • Upload date:
  • Size: 204.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for openemma-0.1.20-py3-none-any.whl
Algorithm Hash digest
SHA256 66a83e27a108b8ae853e305ce726aceaf5cae8e05ab04745cb181bb30f629701
MD5 b8a4f99764333111a18622d47b938731
BLAKE2b-256 55f7493ecffa544d0f96be806bd0f0ba52066edabfb95d40fdba8d56e7b82221

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page