Skip to main content

No project description provided

Project description

PyDSMC

Statistical Model Checking for Neural Agents Using the Gymnasium Interface

Docker Info

This artifact evaluation is based on a docker. You will need to install docker, both docker Desktop and the docker engine. In the terminal, please make sure you are logged in with your docker account (docker login -u <username>).

Artifact Evaluation

1. Loading/Building the Docker Image

We highly recommend loading the docker image with docker image load -i pydsmc-dockerimg.tar.

Alternatively, you can build the image with docker build -t pydsmc-image . (Linux, amd64). On Mac (with M processors, arm64), you can build the image with docker buildx build --platform linux/amd64 -t pydsmc-image:latest .. However, note that depending on the different architectures, building (sometimes even using) the image on Mac (with M processors) might lead to errors that are based in the used benchmarks and are, therefore, beyond our control. If you encounter problems using the image, you should follow the steps described at Running locally. When running locally outside docker, we observed that all described experiments can be reproduced, independently from the processor architecture.

In case the reviewers want to inspect or locally execute it: The docker image's entry point is entrypoint.py.

2. Reproducing the motivation figures (Fig. 1 - 3)

We provide a jupyter notebook which generates the exact motivation figures contained in the manuscript.

a) Reproduce the results from precomputed logs/samples

To run it and generate these plots yourself, run the following command and open the notebook in your browser.

docker run -it --rm \
    -v $(pwd)/output:/output \
    -v $(pwd)/motivation/data:/workdir/motivation/data \
    -p 8888:8888 \
    pydsmc-image:latest \
    --notebook

On Windows, run

docker run -it --rm -v %cd%/output:/output -v %cd%/motivation/data:/workdir/motivation/data -p 8888:8888 pydsmc-image:latest --notebook

b) Reproduce the results via bootstrapping

For Fig. 2 and Fig. 3, the notebook additionally provides the option to rerun the bootstrapping to reproduce our results based on the precomputed logs/sample runs throughout the training.

To generate the bootstrapped data for the red confidence intervals in figure 2 and 3 yourself, do not execute the notebook's cell loading the csv data:

# Load cached bootstrapped data. If not executed, the script will generate and save the data
bootstrapped_mjc_data = pd.read_csv("data/bootstrapped_mjc_data.csv").values.tolist()
bootstrapped_rt_data = pd.read_csv("data/bootstrapped_rt_data.csv").values.tolist()
bootstrapped_mg_data = pd.read_csv("data/bootstrapped_mg_data.csv").values.tolist()

c) Reproduce the results from scratch

Additionally, for all Figures (1 - 3) you can use the jupyter notebook to first retrain all neural agents and simultaneously evaluate the agents to reproduce these plots. Note that, on a standard machine, this will take a very long time, at least a week. We generated our data on a cluster and parallelized the training of the different agents since they are independent.

3. Reproducing the exemplary evaluation table (section 6.3; table 1)

All following commands can be run with different entry point flags. Use all to run the command on all benchmarks. Alternatively, you can use any subset of the following flags to run the command for the specific benchmark subset:

  • ant
  • humanoid
  • humanoidstandup
  • halfcheetah
  • mountaincar
  • pgtg
  • minigrid
  • breakout

(For example, replacing all with pgtg ant will run the evaluation for pgtg and ant only).

We make use of parallel environments to speed up the computation. If you run into memory problems, use **todo** to decrease the number of parallel environments.

We provide three different ways of reproducing our results:

a) Reproduce the results from the contained logs/samples

The following command will create the markdown tables for all environments in output/_markdown from the contained logs.

docker run -it --rm \
    -v $(pwd)/output:/output \
    pydsmc-image:latest \
    all

This command will be processed quickly, within seconds.

b) Reproduce the results from the pretrained neural agents

To actually run the evaluation on the pretrained agents on your machine, add the --eval flag:

docker run -it --rm \
    -v $(pwd)/output:/output \
    pydsmc-image:latest \
    --eval \
    all

Depending on your machine, this will take up to a few hours per benchmark (with breakout being the most expensive one). See paper, table 1 for rough time estimates. In sum, this can take up to one day.

c) Reproduce the results from scracth

You can also decide to train the agents on your machine instead of using the contained ones. For that simply add the --train flag:

docker run -it --rm \
    -v $(pwd)/output:/output \
    pydsmc-image:latest \
    --train --eval \
    all

Depending on your machine, this experiment will take several hours up until a few days.

4. Running locally

To not use docker and instead run the code locally, set up a python environment and install the necessary dependencies from requirements.txt (Linux, Windows) or requirements_mac.txt (Mac) For example, with virtualenvwrapper, you can execute the following commands:

mkvirtualenv artifact-pydsmc --python=3.10
pip install -r requirements_mac.txt
pip install .

Then, while having the virtual environment activated, execute the following commands to run the artifact:

a) Notebook

./entrypoint.py --relative --notebook

b) Tables

./entrypoint.py --relative all
./entrypoint.py --relative --eval all
./entrypoint.py --relative --train --eval all

As mentioned above, you can simply replace all with any subset of environments.

Tool Description

PyDSMC is an open-source Python library for statistical model checking of neural agents on arbitrary Gymnasium environments. It is designed to be lightweight and easy-to-use while being based on established statistical methods to provide guarantees on the investigated properties' performances. Implementing the Gymnasium interface, PyDSMC is widely applicable and fully agnostic to the environments underlying implementation.

PyDSMC is based on Deep Statistical Model Checking and aims to facilitate greater adoption of statistical model checking by simplifying its usage.

License

The code introduced by this project is licensed under the MIT license. Please consult the bundled LICENSE file for the full license text.

Statistical Method Selection

Figure

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydsmc-0.2.5.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydsmc-0.2.5-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file pydsmc-0.2.5.tar.gz.

File metadata

  • Download URL: pydsmc-0.2.5.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.4.0

File hashes

Hashes for pydsmc-0.2.5.tar.gz
Algorithm Hash digest
SHA256 44c9d60fce9f1b883d14527031f9c6e3f11e2db63165ce5728d55e6acce30d33
MD5 2e6beae3dcb8a51130b7a1988ce4b4e0
BLAKE2b-256 84eb7ab69d87e7e21ca25cf9bb86c47cae20af596449bf95da8a16645966ad87

See more details on using hashes here.

File details

Details for the file pydsmc-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: pydsmc-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.4.0

File hashes

Hashes for pydsmc-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9f91e2a630182a2f77cb92c3de44f50ac7002174fe71811d387eab633746c592
MD5 61bbf196241dd54d5b0f3c9032b007fe
BLAKE2b-256 dcfeb34631dc8d48e5939df3324c25c7572167d3bb61cff273416126126745c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page