Skip to main content

Google DeepMind Robotics Interfaces and utilities.

Project description

gdm_robotics: The Google DeepMind Robotics interfaces

This package describes a set of interfaces for Python reinforcement learning (RL) environments. It consists of the following core components:

  • gdm_robotics.interfaces.Environment: An abstract base class for RL environments.
  • gdm_robotics.interfaces.Policy: An abstract base class for Agent policies.
  • gdm_robotics.interfaces.EpisodicLogger: An abstract base class for loggers for Agent/Environment interaction.
  • gdm_robotics.runtime.RunLoop: A concrete RunLoop class to run a policy against an environment and logging their interaction.

The core classes

The Environment interface.

The Environment interface is very similar to (and indeed, it inherits from) the DeepMind Environment interface. It provides an abstraction for any "controllable" system, as seen from the perspective of an RL agent.

An Environment mainly exposes two methods:

  • reset which resets the environment to a known state returning the "timestep" representing it, and
  • step which applies a given action and returns a new "timestep".

A Timestep is a tuple grouping:

  • observation: the actual observations provided by the environment
  • reward: the reward associated with this specific step.
  • discount: the discount associated with this specific step.
  • step_type: a value specifying if this is the Timestep returned by the reset (step_type == FIRST), the last timestep of the episode (step_type == LAST. Users need to check the discount to understand if this is a termination (zero-like discount) or a truncation (discount different from zero)).

Additionally, an Environment returns specifications (in the shape of dm_env.specs.Array object) representing the accepted actions (action_spec) and timestep (timestep_spec).

This Environment abstract class provides typing support and is more strict than the dm_env equivalent.

The Policy interface.

The Policy interface provides an abstraction for Agent policies. The Policy interface assumes a stateless Policy, in the sense that the state should be explicitly provided to the policy when calling its methods.

Note that it is nevertheless possible to have implicit state and make the class stateful.

A Policy should implement:

  • initial_state: returning the initial state of the Policy.
  • step: given a Timestep and a Policy state, generate the next action (and return the next Policy state).

Similarly to the Environment a Policy also provides specifications by implementation of the step_spec method.

The Logger interface

The EpisodicLogger class describes the interface for a logger responsible for logging the interaction between a Policy and an Environment during a single episode. As such it exposes:

  • reset: This logs a "reset" Timestep, i.e. the first timestep of the episode.
  • record_action_and_next_timestep: This records a Policy's action and the timestep that has been generated by applying this action to the Environment.
  • write: Marks an episode as terminated triggering (depending on the implementations) a flush.

The Runloop class

The Runloop is a concrete class that is responsible for running possibly multiple episodes of a Policy interacting with an Environment.

The Runloop requires at least a single Environment, a single Policy and a collection of EpisodicLoggers.

When calling run (or run_single_episode) the Runloop will take care of correctly stepping Policy and Environment and logging the generated data.

Runloop customisation

The Runloop can be customised by passing different options.

  1. Signal handlers. The Runloop by default will not intercept any SIGINT and it is responsibility of the caller to handle those. In some cases it might be beneficial to let the Runloop handle that. In that case, pass handle_sigint=True to the Runloop initializer.
  2. Provide reset options to the Environment. In case your Environment accepts reset options (for example because it wraps a Gymnasium environment) you might want to provide options at reset time. In this case you can specify a callable to the init reset_options_provider argument which will be called before every episode reset.
  3. More complex customisation can be done by using the RunloopRuntimeOperations class. The Runloop initializer accepts a collection of RunloopRuntimeOperations objects.

Adapters

Common RL environment libraries such as dm_env.Environment and gymnasium.Env can be exposed as gdm_robotics.interfaces.Environments by using the provided environment wrappers in the adapter sub-package, e.g. to wrap a dm_env.Environment object:

from gdm_robotics.adapters import dm_env_to_gdmr_env_wrapper

original_env: dm_env.Environment = ...
env = dm_env_to_gdmr_env_wrapper.DmEnvToGdmrEnvWrapper(original_env)

Installation

gdm_robotics can be installed from PyPI using pip:

pip install gdm_robotics

Licence and Disclaimer

Copyright 2025 Google LLC

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gdm_robotics-1.0.2.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gdm_robotics-1.0.2-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file gdm_robotics-1.0.2.tar.gz.

File metadata

  • Download URL: gdm_robotics-1.0.2.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for gdm_robotics-1.0.2.tar.gz
Algorithm Hash digest
SHA256 7bb9cb433bcca30a2fa821669ebb17f652faad86310224a98769324f0b6fafe0
MD5 717a686a70c88a6efba4b55282702af4
BLAKE2b-256 cc2c8d89c7d367de4b29784b94853dabb1450c601dead91f4a406818bc18d249

See more details on using hashes here.

File details

Details for the file gdm_robotics-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: gdm_robotics-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for gdm_robotics-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01e315c9e2741d87eec46d8af5c85146c58964168c282e24344b4468af26ad4d
MD5 77df6818e605e9ae366d03f10a0f5af6
BLAKE2b-256 3b5373b60db518c3f777e5a5a8758a11b4d939d4fb51764c5e9576a32b89ebe6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page