Skip to main content

Reinforcement Learning (RL) library

Project description

turingpoint

Turing point is a Reinforcement Learning (RL) library. It allows for multiple (hetrogenous) agents seamlessly. Per-agent partial observation is natural with Turing point. Different agents can act in differnet frequencies. You may opt to continue using also the libraries that you're currently using, such as Gym, Stable-Baselines3, RLLib, etc. Turing point integrates easily with existing RL libraries and your own custom code. Integration of RL agents in the target applications should be significantly easier with Turing point.

Consider a Gym/SB3 training realm:

import gym

from stable_baselines3 import A2C

# Creating the specific Gym environment.
env = gym.make("CartPole-v1")

# An agent is created, it is injected with the environment.
# The agent probably makes a copy of the passed environment, wraps it etc.
model = A2C("MlpPolicy", env, verbose=1)

# The agent is trained against its environment.
# We can assume what is happening there (obs, action, reward, obs, ..), yet it is not explicit.
model.learn(total_timesteps=10_000)

# we now evaluate the performance of our agent with the help of the environment that the agent maintains.
vec_env = model.get_env()
obs = vec_env.reset()
for i in range(1000):
    # The parameter for predict is the observation,
    #  which is good as our application (ex. an actual cartpole robot) can indeed provide such observations and use the return action.
    # Note: the action space as well as the observation space are defined in the environment.
    # Also note. The environment is aware of the agent. This is how the environment was designed.
    # The action space of the agent is coded in the environment.
    # The observation space is intended for the agent and reflects probably also what the agent should know about itself.
    # The _state output is a bit suspicious. It is here probably as the model also predicts the state.
    action, _state = model.predict(obs, deterministic=True)
    # Here the reward, done, and info outputs are just for our evaluation.
    # Mainly what is happening here is that the environment moves to a new state.
    # The reward and done flag, are specific to the agent.
    # If there are other entities in the environments, those may continue to live also after done=True and may not care (directly) about this specific reward.
    obs, reward, done, info = vec_env.step(action)
    # We render here. We did not render during the training(learn) which probably makes sense performace wise.
    vec_env.render()
    # VecEnv resets automatically
    # if done:
    #   obs = vec_env.reset()

# Observation: we reset the environment. The model is supposed to be memory-less (MDP assumption). 

In the comments above, we've tried to give the intuition why some additional thinking is needed about the software that is used to provision those environment / agent(s) realms.

Let's see how above can be described with Turing point:

import gym

from stable_baselines3 import A2C

from turingpoint.gym_utils import (
  AgentParticipant,
  EnvironmentParticipant,
  RenderParticipant,
  GymAssembly
)


# Creating the specific Gym environment.
env = gym.make("CartPole-v1")

# An agent is created, it is injected with the environment.
# The agent probably makes a copy of the passed environment, wraps it etc.
model = A2C("MlpPolicy", env, verbose=1)

# The agent is trained against its environment.
# We can assume what is happening there (obs, action, reward, obs, ..), yet it is not explicit.
model.learn(total_timesteps=10_000)

# above starts the same

# now ..

vec_env = model.get_env()
assembly = GymAssembly(vec_env, [
    AgentParticipant(agent),
    EnvironmentParticipant(vec_env),
    RenderParticipant(vec_env)
])

for i in range(1000):
    assembly.launch()

What did we gain and was it worth the extra coding? Let's add to the environment a second agent, wind, or maybe it is part of the augmented environment, does not really matter. Let's just add it.

import gym

from stable_baselines3 import A2C

from turingpoint.gym_utils import (
  AgentParticipant,
  EnvironmentParticipant,
  RenderParticipant,
  GymAssembly
)


# Creating the specific Gym environment.
env = gym.make("CartPole-v1")

# An agent is created, it is injected with the environment.
# The agent probably makes a copy of the passed environment, wraps it etc.
model = A2C("MlpPolicy", env, verbose=1)

# The agent is trained against its environment.
# We can assume what is happening there (obs, action, reward, obs, ..), yet it is not explicit.
model.learn(total_timesteps=10_000)

def wind(parcel: dict) -> None:
    action_wind = "blow left" if random() < 0.5 else "blow right"
    parcel['action_wind'] = action_wind

def wind_impact(parcel: dict) -> None:
    action_wind = parcel['action_wind']
    # We'll modify the action of the agent, given the wind,
    # as we don't have here access to the state of the environment.
    parcel['action'] = ...

vec_env = model.get_env()
assembly = GymAssembly(vec_env, [
    AgentParticipant(agent),
    wind,
    wind_impact,
    EnvironmentParticipant(vec_env),
    RenderParticipant(vec_env)
])

for i in range(1000):
    assembly.launch()

To install use for example:

pip install turingpoint

The examples are found in the homepage (github) under the 'examples' folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turingpoint-0.0.1.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

turingpoint-0.0.1-py3-none-any.whl (5.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page