No project description provided
Project description
DSMC-Tool
DSMC-Tool is a package for Deep Statistical Model Checking (DSMC) of Deep Reinforcement Learning agents. This application allows users to evaluate RL agents based on a wide array of properties, making it an essential tool for ensuring robust agent performance. The package is developed to be fully compatible with Gymnasium environments.
Table of Contents
DSMC
In a nutshell, DSMC (Deep Statistical Model Checking) leverages a DRL (Deep Reinforcement Learning) agent to run a series of episodes in a given environment and statistically estimate a property of interest. The process involves creating a confidence interval around the estimate, based on the provided parameter kappa, and comparing it against the specified accuracy parameter epsilon. This estimation process operates iteratively: after generating a certain number of episodes, the algorithm checks whether the estimate satisfies the desired accuracy. If not, additional episodes are generated, and the process repeats. The loop ensures termination as the confidence interval becomes tighter with more episodes, eventually meeting the specified criteria.
Evaluator Initialisation
The first step in using DSMC-Tool is initialising an Evaluator object from dsmc_tool.evaluator. The following constructor parameters can be set:
| Parameter | Type | Default Value | Description |
|---|---|---|---|
| env | Env | None | Your RL environment, defined as a Gymnasium environment |
| initial_episodes | int | 100 | The number of episodes generated in the initial run of evaluation |
| subsequent_episodes | int | 50 | The number of episodes generated in every run after the initial run |
Note that the number of initial episodes should be chosen relatively high to avoid premature termination.
Properties
After initialisation, one or more properties have to be created and registered for evaluation. Properties are implementations of the abstract class Property, which is provided in dsmc_tool.property:
class Property:
def __init__(self, name: str):
self.name = name
self.json_filename = name + ".json"
pass
def check(self, trajectory: List[Tuple[Any, Any, Any]]) -> float:
pass
By default, the name of the JSON file containing the output is identical to the property name, but this can be changed in your implementation or in the register function. The check function is used to derive a result for the given episode, provided in the form of trajectory. This is a List of tuples with each tuple describing a single time step by containing the current observation, action, and reward. For example, this is the implementation of a Property calculating the return:
class ReturnProperty(Property):
def __init__(self, name: str = "return", gamma: float = 0.99):
super().__init__(name)
self.gamma = gamma
self.binomial = False
def check(self, trajectory: List[Tuple[Any, Any, Any]]) -> float:
ret = 0
for t in range(len(trajectory)):
ret += trajectory[t][2] * np.power(self.gamma, t)
return ret
Additionally to the possibility of creating custom properties, there is a library of pre-implemented properties:
| Name | Additional Inputs | Description |
|---|---|---|
| ActionDiversityProperty | Number of actions | Calculates the ratio of unique actions taken in the episode to the total number of actions |
| ActionEntropyProperty | Number of actions | Calculates action entropy, a measure of how much randomness is involved in the action decisions |
| ActionTakenProperty | Action | Tests whether the given action was applied in the episode |
| ActionThresholdProperty | Action, Threshold | Tests whether the given action was applied a number of times higher than or equal to the threshold |
| ActionVarietyProperty | Threshold | Tests whether a number of actions higher than or equal to the threshold was applied |
| ConsecutiveSameActionProperty | Action, Threshold | Tests whether the given action was applied in a consecutive number of time steps higher than or equal to the threshold |
| EarlyTerminationProperty | Step maximum | Tests whether the episode terminated within a number of time steps lower than or equal to the step maximum |
| EpisodeLengthProperty | None | Calculates the length of the episode |
| GoalBeforeStepLimitProperty | Goal reward, Step limit | Tests whether the goal, signified by a unique goal reward, was reached within a number of time steps lower than or equal to the step limit |
| GoalReachingProbabilityProperty | Goal reward | Tests whether the goal, signified by a unique goal reward, was reached |
| NormalizedReturnProperty | Gamma | Calculates the return discounted with discount factor gamma and normalized by the episode length |
| PathEfficiencyProperty | Path | Calculates the percentage of taken actions that correspond to the actions taken in the given path |
| PathLengthEfficiencyProperty | Path length | Calculates the ratio of the length of the episode to the given path length |
| ReturnProperty | Gamma | Calculates the return discounted with discount factor gamma |
| ReturnThresholdProperty | Gamma, Threshold | Tests whether the return discounted with discount factor gamma is higher than or equal to the threshold |
| RewardToLengthRatioProperty | Gamma, Threshold | Calculates the ratio of the sum of all rewards to the length of the episode |
| RewardVarianceProperty | Gamma, Threshold | Calculates the variance in the rewards accumulated in the episode |
| StateCoverageProperty | Number of states | Calculates the ratio of unique states visited in the episode to the total number of states |
| StateTransitionSmoothnessProperty | Number of states | Calculates the ratio of unique states visited in the episode to the total number of states (assumes states are represented as vectors) |
| StateVisitProperty | State | Tests whether the given state was visited in the episode |
A property can be registered for evaluation using the evaluator's register functions. Additionally to the property object, you can provide a custom name for the output JSON file here.
Evaluation
Once at least one property has been registered, the eval function can be called. This function provides a lot of configuration via the input variables, presented here:
| Parameter | Type | Default Value | Description |
|---|---|---|---|
| agent | None given | None | Your DRL agent implementation |
| epsilon | float | 0.1 | The accuracy parameter (see section DSMC) |
| kappa | float | 0.05 | The confidence parameter (see section DSMC) |
| exploration_rate | float | None | Probability of choosing a random action during evaluation |
| act_function | None given | None | The function your agent implementation uses to decide on an action |
| save_interim_results | bool | False | Whether interim results should be saved in the output files |
| interim_interval | int | None | How many episodes should be between the interim results |
| output_full_results_list | bool | False | Whether a list of all results should be saved in the output files |
| relative_epsilon | bool | False | Whether epsilon should be used relative to the estimate |
| truncation_steps | int | None | After how many steps evaluation episodes should be truncated |
If save_interim_results is False, the results will only be saved one time, once the evaluation has ended for all properties. Otherwise, the results are saved every interim_interval episodes. In general, the output files have the JSON format, and hold information about the mean, variance, standard deviation, and confidence interval (according to kappa) in regard to the corresponding property's results. Additionally to these files, the function returns a dictionary results_per_property, which holds an Evaluation_results object for every evaluated property. These objects allow you to do all calculations from the output files manually.
Example
Here is an implementation example of all of this combined:
import gymnasium as gym
from gymnasium.wrappers import FlattenObservation
from stable_baselines3 import DQN
from dsmc_tool.evaluator import Evaluator
import dsmc_tool.property as prop
env = gym.make("CartPole-v1")
env = FlattenObservation(env)
agent = DQN("MlpPolicy", env, verbose=1)
agent.learn(total_timesteps=1000)
evaluator = Evaluator(env=env, initial_episodes=100, subsequent_episodes=50)
property = prop.ReturnProperty()
evaluator.register_property(property)
results = evaluator.eval(agent, epsilon=2, kappa=0.05, act_function=agent.predict, save_interim_results=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsmc_tool-0.2.1.tar.gz.
File metadata
- Download URL: dsmc_tool-0.2.1.tar.gz
- Upload date:
- Size: 683.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b8ef07696486f4a80da9e86d37d1118bc0c74142d7f8e11eef5bd1216109c67
|
|
| MD5 |
a3792a8670272fb85ecca8d762bbc974
|
|
| BLAKE2b-256 |
96b8463b23d0f15a79f7347374e59ba3f26d3aebcd835a9727006c28818ac115
|
File details
Details for the file dsmc_tool-0.2.1-py3-none-any.whl.
File metadata
- Download URL: dsmc_tool-0.2.1-py3-none-any.whl
- Upload date:
- Size: 699.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f099b3d29f2a53a172d1ff3cc5b2ec50ebc101c2594af1a2349a0a44b90ebc6f
|
|
| MD5 |
e2428b916d2fe528425ec81352afdd35
|
|
| BLAKE2b-256 |
837561e08d29c709b35b7bfec6b66ca8b339cb409cf2214fd2fd17ec52df844c
|