Skip to main content

Safari SDK: the SDK for Google DeepMind Gemini Robotics models

Project description

Safari SDK: the SDK for Google DeepMind Gemini Robotics models 🦓🦄🐘🐒🐍

Disclaimer

This is not an officially supported Google product.

Safari SDK provides full lifecycle toolings necessary for using Gemini Robotics models, including but not limited to, access checkpoint, serving a model, evaluate the model on robot and in sim, upload data, finetuning the model, download the finetuned checkpoint, etc. Most of the functionality requires you to join Gemini Robotics Trusted Tester Program to use. See details in Gemini Robotics main page.

Source Code

The source code can be found in GitHub.

Note: Unless stated otherwise, the commands below assume that your working directory is the root of your repository clone (ie, the same directory which contains the pyproject.toml file).

Code Structure

The safari-sdk is structured and distributed as a Python pip package, along with 1 internal dependency:

  • safari-sdk: This is the root level package, and the only one that users should directly install or use. It is a pure-python package.
  • safari-sdk-logging: This is an internal dependency which contains the C++ and pybind11 logging code.

Both packages place their importable files into the safari_sdk top-level namespace, according to PEP 420 – Implicit Namespace Packages

Why Separate Packages?

The safari-sdk package is often installed from source code, for development and debugging. However, C++ build dependencies take a long time (30+ minutes) to compile from source. Putting these in separate packages allows prebuilt versions of those to be used when the (pure-python) base safari-sdk package is installed from source, which in turn reduces the time needed to build and install that package to a few seconds (at least in cases where there are no changes to the C++ logging code).

Installing the SDK

Installing Pip Dependencies

The pyproject.toml file specifies the dependencies for the Safari SDK, including allowed version ranges for some packages. The requirements.txt file specifies exact versions and hashes of each of those dependencies; internal unit tests use these versions, and installing them in a virtual environment before installing the SDK is recommended to avoid failures due to upstream package changes:

pip install -r requirements.txt

Updating Pip Dependencies

A script is provided which updates the requirements.txt file to the most recent versions of dependencies which satisfy the constraints in the pyproject.toml file. This allows those updates to be done at discrete times (rather than whenever those updates are pushed to PyPi) and tested in isolation before being pushed to all users. To run that script:

scripts/update_pip_dependencies.sh

Installing from PyPi

The Safari SDK can be easily installed via PyPI.

pip install safari-sdk

Installing from Source Code

The Safari SDK can also be installed from the source code.

pip install -e .[dev]

Building the Wheel

To build a Python wheel, run the following command from the root of the repository.

scripts/build_wheel.sh

This script will build a pip installable wheel for the Safari SDK, and print the file's path to stdout.

Model support

Safari SDK aims to support all models in the Gemini Robotics model series.

Trusted Testers can access the Gemini Robotics On Device model from SDK v2.4.1.

Libraries

Libraries related to robot data logging is in safari_sdk/logging.

Libraries related to model inference and interface with model servers are in safari_sdk/model.

Libraries and binary related to accessing model checkpoints, upload data and request of model finetune can be found in safari_sdk/flywheel.

Examples, including robot and simulation evaluation of models are in examples/. Aloha specific eval code are in examples/aloha.

Flywheel CLI

The flywheel CLI is a convenient CLI tool available after installation of the pip package. It provides a set of commands to interact with the Gemini Robotics platform, such as training models, serving models, managing data, and downloading artifacts.

To use the CLI

flywheel-cli <command> [--flags] [--flags]

Supported commands are:

  • train: Train a model. Requires specifying task ID, start date, and end date.
  • serve: Serve a model. Requires specifying the training job ID.
  • list: List available training jobs.
  • list_serve: List available serving jobs.
  • data_stats: Show data statistics available for training.
  • download: Download artifacts from a training job or a specific artifact ID.
  • upload_data: Upload data to the data ingestion service.
  • version: Show the version of the SDK.
  • help: Show this help message with all the available commands and flags.

Agent

The Safari SDK includes a comprehensive agent framework for building interactive robotics agents powered by Gemini models. See YouTube Video: Gemini Robotics 1.5: Using agentic capabilities. The framework is located in safari_sdk/agent/framework and provides a modular architecture for creating agents that can perceive their environment, reason about tasks, and control robot hardware.

Key Components

Agents (safari_sdk/agent/framework/agents/): Base agent classes that integrate with the Gemini Live API to provide conversational interaction and tool use capabilities.

Embodiments (safari_sdk/agent/framework/embodiments/): Hardware-specific interfaces that connect agents to physical robot systems (e.g., Aloha robot). Each embodiment provides tools for robot control.

Tools (safari_sdk/agent/framework/tools/): Modular capabilities that agents can use, including:

  • Run instruction
  • Success detection
  • Scene description
  • etc.

Event Bus (safari_sdk/agent/framework/event_bus/): Asynchronous publish-subscribe system for communication between agent components.

Configuration (safari_sdk/agent/framework/config.py): Centralized configuration management using AgentFrameworkConfig, supporting both programmatic configuration and flag-based setup.

Aloha Agent Example

The examples/aloha/agent/ directory contains agent implementations for the Aloha robot platform.

The primary example is simple_agent.py, which provides a conversational agent that can control the Aloha robot using natural language instructions.

To run the Aloha agent, use the provided run.py script:

python examples/aloha/agent/run.py --agent_name=simple_agent

The Aloha agent demonstrates integration of vision-based robot control, multi-camera perception, and conversational interaction with Gemini models.

Alternatively, you build your own agent using the agent framework.

The codebase is still in active development. We will update our most updated user guide with Trusted Testers of Gemini Robotics.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safari_sdk-2.141.0.tar.gz (448.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

safari_sdk-2.141.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

safari_sdk-2.141.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

safari_sdk-2.141.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file safari_sdk-2.141.0.tar.gz.

File metadata

  • Download URL: safari_sdk-2.141.0.tar.gz
  • Upload date:
  • Size: 448.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for safari_sdk-2.141.0.tar.gz
Algorithm Hash digest
SHA256 7debfb7edbc727f5c77cc2757f310aeefdd797f2dfadf3795c31b428563f2371
MD5 0035500733003f0f4f50867fef9fae2a
BLAKE2b-256 f043672d8d6decd88538e0abb2f8d1c2402f9f07097808219398c1bb070b8bd2

See more details on using hashes here.

File details

Details for the file safari_sdk-2.141.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for safari_sdk-2.141.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ccc7abcbac8b1d7a362196d339b9351d9256a9e9397aa2f581035c67d86e2fde
MD5 fdc1e4315e6bbb828c8a744d8c2e7c95
BLAKE2b-256 e50aaa597294b1b8764345634a38dbfc9908a8ae06e1985534237d28209982e7

See more details on using hashes here.

File details

Details for the file safari_sdk-2.141.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for safari_sdk-2.141.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ca21817d9a5a6c3fb80bc466d51077214258e74f91691b638359878d4a3aaf14
MD5 0083db60616200b7ee06ade3d5d5206b
BLAKE2b-256 e81726bb45a0a1aea33b1af5040d262a5d90ec820646fccfb683bf512399ddc0

See more details on using hashes here.

File details

Details for the file safari_sdk-2.141.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for safari_sdk-2.141.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0847842353d54492fb0eda8c9087e621ebd607fe102f707d41c59ddb916b04a4
MD5 0e9e4245ba229e9ad3333cbf45a6b172
BLAKE2b-256 18c677dc7727b11e7d1f014e33c30d3d79cd69affc3c6aac347d9c5c88abfa6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page