Skip to main content

No project description provided

Project description

CI Code style: black Imports: isort GitHub star chart Open Issues

NVIDIA Isaac GR00T N1

NVIDIA Isaac GR00T N1 Header

NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.

GR00T N1 is trained on an expansive humanoid dataset, consisting of real captured data, synthetic data generated using the components of NVIDIA Isaac GR00T Blueprint (examples of neural-generated trajectories), and internet-scale video data. It is adaptable through post-training for specific embodiments, tasks and environments.

real-robot-data sim-robot-data

The neural network architecture of GR00T N1 is a combination of vision-language foundation model and diffusion transformer head that denoises continuous actions. Here is a schematic diagram of the architecture:

model-architecture

Here is the general procedure to use GR00T N1:

  1. Assuming the user has already collected a dataset of robot demonstrations in the form of (video, state, action) triplets.
  2. User will first convert the demonstration data into the LeRobot compatible data schema (more info in getting_started/LeRobot_compatible_data_schema.md), which is compatible with the upstream Huggingface LeRobot.
  3. Our repo provides examples to configure different configurations for training with different robot embodiments.
  4. Our repo provides convenient scripts to finetune the pre-trained GR00T N1 model on user's data, and run inference.
  5. User will connect the Gr00tPolicy to the robot controller to execute actions on their target hardware.

Target Audience

GR00T N1 is intended for researchers and professionals in humanoid robotics. This repository provides tools to:

  • Leverage a pre-trained foundation model for robot control
  • Fine-tune on small, custom datasets
  • Adapt the model to specific robotics tasks with minimal data
  • Deploy the model for inference

The focus is on enabling customization of robot behaviors through finetuning.

Prerequisites

  • We have tested the code on Ubuntu 20.04 and 22.04, GPU: H100, L40, RTX 4090 and A6000 for finetuning and Python==3.10, CUDA version 12.4.
  • For inference, we have tested on Ubuntu 20.04 and 22.04, GPU: RTX 3090, RTX 4090 and A6000
  • If you haven't installed CUDA 12.4, please follow the instructions here to install it.
  • Please make sure you have the following dependencies installed in your system: ffmpeg, libsm6, libxext6

Installation Guide

Clone the repo:

git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T

Create a new conda environment and install the dependencies. We recommend Python 3.10:

Note that, please make sure your CUDA version is 12.4. Otherwise, you may have a hard time with properly configuring flash-attn module.

conda create -n gr00t python=3.10
conda activate gr00t
pip install --upgrade setuptools
pip install -e .
pip install --no-build-isolation flash-attn==2.7.1.post4 

Getting started with this repo

We provide accessible Jupyter notebooks and detailed documentations in the ./getting_started folder. Utility scripts can be found in the ./scripts folder.

1. Data Format & Loading

  • To load and process the data, we use Huggingface LeRobot data, but with a more detailed modality and annotation schema (we call it "LeRobot compatible data schema").
  • An example of LeRobot dataset is stored here: ./demo_data/robot_sim.PickNPlace. (with additional modality.json file)
  • Detailed explanation of the dataset format is available in getting_started/LeRobot_compatible_data_schema.md
  • Once your data is organized in this format, you can load the data using LeRobotSingleDataset class.
from gr00t.data.dataset import LeRobotSingleDataset
from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.data.dataset import ModalityConfig
from gr00t.experiment.data_config import DATA_CONFIG_MAP

# get the data config
data_config = DATA_CONFIG_MAP["gr1_arms_only"]

# get the modality configs and transforms
modality_config = data_config.modality_config()
transforms = data_config.transform()

# This is a LeRobotSingleDataset object that loads the data from the given dataset path.
dataset = LeRobotSingleDataset(
    dataset_path="demo_data/robot_sim.PickNPlace",
    modality_configs=modality_config,
    transforms=None,  # we can choose to not apply any transforms
    embodiment_tag=EmbodimentTag.GR1, # the embodiment to use
)

# This is an example of how to access the data.
dataset[5]

2. Inference

from gr00t.model.policy import Gr00tPolicy
from gr00t.data.embodiment_tags import EmbodimentTag

# 1. Load the modality config and transforms, or use above
modality_config = ComposedModalityConfig(...)
transforms = ComposedModalityTransform(...)

# 2. Load the dataset
dataset = LeRobotSingleDataset(.....<Same as above>....)

# 3. Load pre-trained model
policy = Gr00tPolicy(
    model_path="nvidia/GR00T-N1-2B",
    modality_config=modality_config,
    modality_transform=transforms,
    embodiment_tag=EmbodimentTag.GR1,
    device="cuda"
)

# 4. Run inference
action_chunk = policy.get_action(dataset[0])

User can also run the inference service using the provided script. The inference service can run in either server mode or client mode.

python scripts/inference_service.py --model_path nvidia/GR00T-N1-2B --server

On a different terminal, run the client mode to send requests to the server.

python scripts/inference_service.py  --client

3. Fine-Tuning

User can run the finetuning script below to finetune the model with the example dataset. A tutorial is available in getting_started/2_finetuning.ipynb.

Then run the finetuning script:

# first run --help to see the available arguments
python scripts/gr00t_finetune.py --help

# then run the script
python scripts/gr00t_finetune.py --dataset-path ./demo_data/robot_sim.PickNPlace --num-gpus 1

# run using Lora Parameter Eifficient Fine-Tuning
python scripts/gr00t_finetune.py  --dataset-path ./demo_data/robot_sim.PickNPlace --num-gpus 1 --lora_rank 64  --lora_alpha 128  --batch-size 32

You can also download a sample dataset from our huggingface sim data release here

huggingface-cli download  nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim \
  --repo-type dataset \
  --include "gr1_arms_only.CanSort/**" \
  --local-dir $HOME/gr00t_dataset

The recommended finetuning configurations is to boost your batch size to the max, and train for 20k steps.

Hardware Performance Considerations

  • Finetuning Performance: We used 1 H100 node or L40 node for optimal finetuning. Other hardware configurations (e.g. A6000, RTX 4090) will also work but may take longer to converge. The exact batch size is dependent on the hardware, and on which component of the model is being tuned.
  • LoRA finetuning: We used 2 A6000 GPUs or 2 RTX 4090 GPUs for LoRA finetuning. User can try out different configurations for effective finetuning.
  • Inference Performance: For real-time inference, most modern GPUs perform similarly when processing a single sample. Our benchmarks show minimal difference between L40 and RTX 4090 for inference speed.

For new embodiment finetuning, checkout our notebook in getting_started/3_new_embodiment_finetuning.ipynb.

4. Evaluation

To conduct an offline evaluation of the model, we provide a script that evaluates the model on a dataset, and plots it out. Quick try: python scripts/eval_policy.py --plot --model_path nvidia/GR00T-N1-2B

Or you can run the newly trained model in client-server mode.

Run the newly trained model

python scripts/inference_service.py --server \
    --model_path <MODEL_PATH> \
    --embodiment_tag new_embodiment
    --data_config <DATA_CONFIG>

Run the offline evaluation script

python scripts/eval_policy.py --plot \
    --dataset_path <DATASET_PATH> \
    --embodiment_tag new_embodiment \
    --data_config <DATA_CONFIG>

You will then see a plot of Ground Truth vs Predicted actions, along with unnormed MSE of the actions. This would give you an indication if the policy is performing well on the dataset.

FAQ

Does it work on CUDA ARM Linux?

I have my own data, what should I do next for finetuning?

  • This repo assumes that your data is already organized according to the LeRobot format.

What is Modality Config? Embodiment Tag? and Transform Config?

  • Embodiment Tag: Defines the robot embodiment used, non-pretrained embodiment tags are all considered as new embodiment tags.
  • Modality Config: Defines the modalities used in the dataset (e.g. video, state, action)
  • Transform Config: Defines the Data Transforms applied to the data during dataloading.
  • For more details, see getting_started/4_deeper_understanding.md

What is the inference speed for Gr00tPolicy?

Below are benchmark results based on a single L40 GPU. Performance is approximately the same on consumer GPUs like RTX 4090 for inference (single sample processing):

Module Inference Speed
VLM Backbone 22.92 ms
Action Head with 4 diffusion steps 4 x 9.90ms = 39.61 ms
Full Model 62.53 ms

We noticed that 4 denoising steps are sufficient during inference.

Contributing

For more details, see CONTRIBUTING.md

License

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvidia_gr00t_sdk-0.1.0.tar.gz (40.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nvidia_gr00t_sdk-0.1.0-py3-none-any.whl (40.1 MB view details)

Uploaded Python 3

File details

Details for the file nvidia_gr00t_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: nvidia_gr00t_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 40.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.9

File hashes

Hashes for nvidia_gr00t_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 20c4c05127f327f3d7095041cb96a0726dcd9f8a654729ffbacb3750329b05fb
MD5 ab7cf20b95e1315782b3fed397e9a36d
BLAKE2b-256 be4f7b0effbb604e8cb040bf1d3d5eaa1d18baf9c7f748099ebb16437b719d0c

See more details on using hashes here.

File details

Details for the file nvidia_gr00t_sdk-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for nvidia_gr00t_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b6b77606227a0097da30dc97a787a5e87b60fec009c6e7bdc1198a1aa6a81a5
MD5 f724255b78770366df33e92f8eb516de
BLAKE2b-256 cd1c306becb737b1dd46a4223ca086939c8502a9719cd0e9e63913c355e10a79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page