No project description provided

These details have not been verified by PyPI

Project description

DSMC-Tool

DSMC-Tool is a package for Deep Statistical Model Checking (DSMC) of Deep Reinforcement Learning agents. This application allows users to evaluate RL agents based on a wide array of properties, making it an essential tool for ensuring robust agent performance. The package is developed to be fully compatible with Gymnasium environments.

DSMC
Evaluator Initialization
Properties
Evaluation
Example

DSMC

In a nutshell, DSMC (Deep Statistical Model Checking) leverages a DRL (Deep Reinforcement Learning) agent to run a series of episodes in a given environment and statistically estimate a property of interest. The process involves creating a confidence interval around the estimate, based on the provided parameter kappa, and comparing it against the specified accuracy parameter epsilon. This estimation process operates iteratively: after generating a certain number of episodes, the algorithm checks whether the estimate satisfies the desired accuracy. If not, additional episodes are generated, and the process repeats. The loop ensures termination as the confidence interval becomes tighter with more episodes, eventually meeting the specified criteria.

Evaluator Initialisation

The first step in using DSMC-Tool is initialising an Evaluator object from dsmc_tool.evaluator. The following constructor parameters can be set:

Parameter	Type	Default Value	Description
env	Env	None	Your RL environment, defined as a Gymnasium environment
initial_episodes	int	100	The number of episodes generated in the initial run of evaluation
subsequent_episodes	int	50	The number of episodes generated in every run after the initial run

Note that the number of initial episodes should be chosen relatively high to avoid premature termination.

Properties

After initialisation, one or more properties have to be created and registered for evaluation. Properties are implementations of the abstract class Property, which is provided in dsmc_tool.property:

class Property:
    def __init__(self, name: str):
        self.name = name
        self.json_filename = name + ".json"
        pass

    def check(self, trajectory: List[Tuple[Any, Any, Any]]) -> float:
        pass

By default, the name of the JSON file containing the output is identical to the property name, but this can be changed in your implementation or in the register function. The check function is used to derive a result for the given episode, provided in the form of trajectory. This is a List of tuples with each tuple describing a single time step by containing the current observation, action, and reward. For example, this is the implementation of a Property calculating the return:

class ReturnProperty(Property):
    def __init__(self, name: str = "return", gamma: float = 0.99):
        super().__init__(name)
        self.gamma = gamma
        self.binomial = False

    def check(self, trajectory: List[Tuple[Any, Any, Any]]) -> float:
        ret = 0
        for t in range(len(trajectory)):
            ret += trajectory[t][2] * np.power(self.gamma, t)
        return ret

Additionally to the possibility of creating custom properties, there is a library of pre-implemented properties:

Name	Additional Inputs	Description
ActionDiversityProperty	Number of actions	Calculates the ratio of unique actions taken in the episode to the total number of actions
ActionEntropyProperty	Number of actions	Calculates action entropy, a measure of how much randomness is involved in the action decisions
ActionTakenProperty	Action	Tests whether the given action was applied in the episode
ActionThresholdProperty	Action, Threshold	Tests whether the given action was applied a number of times higher than or equal to the threshold
ActionVarietyProperty	Threshold	Tests whether a number of actions higher than or equal to the threshold was applied
ConsecutiveSameActionProperty	Action, Threshold	Tests whether the given action was applied in a consecutive number of time steps higher than or equal to the threshold
EarlyTerminationProperty	Step maximum	Tests whether the episode terminated within a number of time steps lower than or equal to the step maximum
EpisodeLengthProperty	None	Calculates the length of the episode
GoalBeforeStepLimitProperty	Goal reward, Step limit	Tests whether the goal, signified by a unique goal reward, was reached within a number of time steps lower than or equal to the step limit
GoalReachingProbabilityProperty	Goal reward	Tests whether the goal, signified by a unique goal reward, was reached
NormalizedReturnProperty	Gamma	Calculates the return discounted with discount factor gamma and normalized by the episode length
PathEfficiencyProperty	Path	Calculates the percentage of taken actions that correspond to the actions taken in the given path
PathLengthEfficiencyProperty	Path length	Calculates the ratio of the length of the episode to the given path length
ReturnProperty	Gamma	Calculates the return discounted with discount factor gamma
ReturnThresholdProperty	Gamma, Threshold	Tests whether the return discounted with discount factor gamma is higher than or equal to the threshold
RewardToLengthRatioProperty	Gamma, Threshold	Calculates the ratio of the sum of all rewards to the length of the episode
RewardVarianceProperty	Gamma, Threshold	Calculates the variance in the rewards accumulated in the episode
StateCoverageProperty	Number of states	Calculates the ratio of unique states visited in the episode to the total number of states
StateTransitionSmoothnessProperty	Number of states	Calculates the ratio of unique states visited in the episode to the total number of states (assumes states are represented as vectors)
StateVisitProperty	State	Tests whether the given state was visited in the episode

A property can be registered for evaluation using the evaluator's register functions. Additionally to the property object, you can provide a custom name for the output JSON file here.

Evaluation

Once at least one property has been registered, the eval function can be called. This function provides a lot of configuration via the input variables, presented here:

Parameter	Type	Default Value	Description
agent	None given	None	Your DRL agent implementation
epsilon	float	0.1	The accuracy parameter (see section DSMC)
kappa	float	0.05	The confidence parameter (see section DSMC)
exploration_rate	float	None	Probability of choosing a random action during evaluation
act_function	None given	None	The function your agent implementation uses to decide on an action
save_interim_results	bool	False	Whether interim results should be saved in the output files
interim_interval	int	None	How many episodes should be between the interim results
output_full_results_list	bool	False	Whether a list of all results should be saved in the output files
relative_epsilon	bool	False	Whether epsilon should be used relative to the estimate
truncation_steps	int	None	After how many steps evaluation episodes should be truncated

If save_interim_results is False, the results will only be saved one time, once the evaluation has ended for all properties. Otherwise, the results are saved every interim_interval episodes. In general, the output files have the JSON format, and hold information about the mean, variance, standard deviation, and confidence interval (according to kappa) in regard to the corresponding property's results. Additionally to these files, the function returns a dictionary results_per_property, which holds an Evaluation_results object for every evaluated property. These objects allow you to do all calculations from the output files manually.

Example

Here is an implementation example of all of this combined:

import gymnasium as gym
from gymnasium.wrappers import FlattenObservation
from stable_baselines3 import DQN
from dsmc_tool.evaluator import Evaluator
import dsmc_tool.property as prop

env = gym.make("CartPole-v1")
env = FlattenObservation(env)
agent = DQN("MlpPolicy", env, verbose=1)
agent.learn(total_timesteps=1000)

evaluator = Evaluator(env=env, initial_episodes=100, subsequent_episodes=50)
property = prop.ReturnProperty()
evaluator.register_property(property)
results = evaluator.eval(agent, epsilon=2, kappa=0.05, act_function=agent.predict, save_interim_results=True)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

Jan 6, 2025

0.2.0

Nov 25, 2024

0.1.8

Nov 18, 2024

0.1.7

Nov 8, 2024

0.1.6

Nov 1, 2024

0.1.5

Oct 20, 2024

0.1.4

Oct 17, 2024

0.1.3

Oct 16, 2024

0.1.2

Oct 13, 2024

0.1.1

Oct 12, 2024

0.1.0

Oct 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsmc_tool-0.2.1.tar.gz (683.5 kB view details)

Uploaded Jan 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dsmc_tool-0.2.1-py3-none-any.whl (699.9 kB view details)

Uploaded Jan 6, 2025 Python 3

File details

Details for the file dsmc_tool-0.2.1.tar.gz.

File metadata

Download URL: dsmc_tool-0.2.1.tar.gz
Upload date: Jan 6, 2025
Size: 683.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for dsmc_tool-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`6b8ef07696486f4a80da9e86d37d1118bc0c74142d7f8e11eef5bd1216109c67`
MD5	`a3792a8670272fb85ecca8d762bbc974`
BLAKE2b-256	`96b8463b23d0f15a79f7347374e59ba3f26d3aebcd835a9727006c28818ac115`

See more details on using hashes here.

File details

Details for the file dsmc_tool-0.2.1-py3-none-any.whl.

File metadata

Download URL: dsmc_tool-0.2.1-py3-none-any.whl
Upload date: Jan 6, 2025
Size: 699.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for dsmc_tool-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f099b3d29f2a53a172d1ff3cc5b2ec50ebc101c2594af1a2349a0a44b90ebc6f`
MD5	`e2428b916d2fe528425ec81352afdd35`
BLAKE2b-256	`837561e08d29c709b35b7bfec6b66ca8b339cb409cf2214fd2fd17ec52df844c`

See more details on using hashes here.

dsmc-tool 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DSMC-Tool

Table of Contents

DSMC

Evaluator Initialisation

Properties

Evaluation

Example

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes