No project description provided

These details have not been verified by PyPI

Project description

PyDSMC

Statistical Model Checking for Neural Agents Using the Gymnasium Interface

Docker Info

This artifact evaluation is based on a docker. You will need to install docker, both docker Desktop and the docker engine. In the terminal, please make sure you are logged in with your docker account (docker login -u <username>).

Artifact Evaluation

1. Loading/Building the Docker Image

We highly recommend loading the docker image with docker image load -i pydsmc-dockerimg.tar.

Alternatively, you can build the image with docker build -t pydsmc-image . (Linux, amd64). On Mac (with M processors, arm64), you can build the image with docker buildx build --platform linux/amd64 -t pydsmc-image:latest .. However, note that depending on the different architectures, building (sometimes even using) the image on Mac (with M processors) might lead to errors that are based in the used benchmarks and are, therefore, beyond our control. If you encounter problems using the image, you should follow the steps described at Running locally. When running locally outside docker, we observed that all described experiments can be reproduced, independently from the processor architecture.

In case the reviewers want to inspect or locally execute it: The docker image's entry point is entrypoint.py.

2. Reproducing the motivation figures (Fig. 1 - 3)

We provide a jupyter notebook which generates the exact motivation figures contained in the manuscript.

a) Reproduce the results from precomputed logs/samples

To run it and generate these plots yourself, run the following command and open the notebook in your browser.

docker run -it --rm \
    -v $(pwd)/output:/output \
    -v $(pwd)/motivation/data:/workdir/motivation/data \
    -p 8888:8888 \
    pydsmc-image:latest \
    --notebook

On Windows, run

docker run -it --rm -v %cd%/output:/output -v %cd%/motivation/data:/workdir/motivation/data -p 8888:8888 pydsmc-image:latest --notebook

b) Reproduce the results via bootstrapping

For Fig. 2 and Fig. 3, the notebook additionally provides the option to rerun the bootstrapping to reproduce our results based on the precomputed logs/sample runs throughout the training.

To generate the bootstrapped data for the red confidence intervals in figure 2 and 3 yourself, do not execute the notebook's cell loading the csv data:

# Load cached bootstrapped data. If not executed, the script will generate and save the data
bootstrapped_mjc_data = pd.read_csv("data/bootstrapped_mjc_data.csv").values.tolist()
bootstrapped_rt_data = pd.read_csv("data/bootstrapped_rt_data.csv").values.tolist()
bootstrapped_mg_data = pd.read_csv("data/bootstrapped_mg_data.csv").values.tolist()

c) Reproduce the results from scratch

Additionally, for all Figures (1 - 3) you can use the jupyter notebook to first retrain all neural agents and simultaneously evaluate the agents to reproduce these plots. Note that, on a standard machine, this will take a very long time, at least a week. We generated our data on a cluster and parallelized the training of the different agents since they are independent.

3. Reproducing the exemplary evaluation table (section 6.3; table 1)

All following commands can be run with different entry point flags. Use all to run the command on all benchmarks. Alternatively, you can use any subset of the following flags to run the command for the specific benchmark subset:

ant
humanoid
humanoidstandup
halfcheetah
mountaincar
pgtg
minigrid
breakout

(For example, replacing all with pgtg ant will run the evaluation for pgtg and ant only).

We make use of parallel environments to speed up the computation. If you run into memory problems, use **todo** to decrease the number of parallel environments.

We provide three different ways of reproducing our results:

a) Reproduce the results from the contained logs/samples

The following command will create the markdown tables for all environments in output/_markdown from the contained logs.

docker run -it --rm \
    -v $(pwd)/output:/output \
    pydsmc-image:latest \
    all

This command will be processed quickly, within seconds.

b) Reproduce the results from the pretrained neural agents

To actually run the evaluation on the pretrained agents on your machine, add the --eval flag:

docker run -it --rm \
    -v $(pwd)/output:/output \
    pydsmc-image:latest \
    --eval \
    all

Depending on your machine, this will take up to a few hours per benchmark (with breakout being the most expensive one). See paper, table 1 for rough time estimates. In sum, this can take up to one day.

c) Reproduce the results from scracth

You can also decide to train the agents on your machine instead of using the contained ones. For that simply add the --train flag:

docker run -it --rm \
    -v $(pwd)/output:/output \
    pydsmc-image:latest \
    --train --eval \
    all

Depending on your machine, this experiment will take several hours up until a few days.

4. Running locally

To not use docker and instead run the code locally, set up a python environment and install the necessary dependencies from requirements.txt (Linux, Windows) or requirements_mac.txt (Mac) For example, with virtualenvwrapper, you can execute the following commands:

mkvirtualenv artifact-pydsmc --python=3.10
pip install -r requirements_mac.txt
pip install .

Then, while having the virtual environment activated, execute the following commands to run the artifact:

a) Notebook

./entrypoint.py --relative --notebook

b) Tables

./entrypoint.py --relative all
./entrypoint.py --relative --eval all
./entrypoint.py --relative --train --eval all

As mentioned above, you can simply replace all with any subset of environments.

Tool Description

PyDSMC is an open-source Python library for statistical model checking of neural agents on arbitrary Gymnasium environments. It is designed to be lightweight and easy-to-use while being based on established statistical methods to provide guarantees on the investigated properties' performances. Implementing the Gymnasium interface, PyDSMC is widely applicable and fully agnostic to the environments underlying implementation.

PyDSMC is based on Deep Statistical Model Checking and aims to facilitate greater adoption of statistical model checking by simplifying its usage.

License

The code introduced by this project is licensed under the MIT license. Please consult the bundled LICENSE file for the full license text.

Statistical Method Selection

Figure

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.6

May 7, 2026

0.3.5

Mar 3, 2026

0.3.4

Feb 9, 2026

0.3.3

Oct 27, 2025

0.3.2

Oct 2, 2025

0.3.1

Oct 2, 2025

0.3.0

Aug 24, 2025

This version

0.2.5

May 12, 2025

0.2.4

Apr 17, 2025

0.2.3

Mar 18, 2025

0.2.2

Feb 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydsmc-0.2.5.tar.gz (21.7 kB view details)

Uploaded May 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pydsmc-0.2.5-py3-none-any.whl (21.8 kB view details)

Uploaded May 12, 2025 Python 3

File details

Details for the file pydsmc-0.2.5.tar.gz.

File metadata

Download URL: pydsmc-0.2.5.tar.gz
Upload date: May 12, 2025
Size: 21.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.4.0

File hashes

Hashes for pydsmc-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`44c9d60fce9f1b883d14527031f9c6e3f11e2db63165ce5728d55e6acce30d33`
MD5	`2e6beae3dcb8a51130b7a1988ce4b4e0`
BLAKE2b-256	`84eb7ab69d87e7e21ca25cf9bb86c47cae20af596449bf95da8a16645966ad87`

See more details on using hashes here.

File details

Details for the file pydsmc-0.2.5-py3-none-any.whl.

File metadata

Download URL: pydsmc-0.2.5-py3-none-any.whl
Upload date: May 12, 2025
Size: 21.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.4.0

File hashes

Hashes for pydsmc-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f91e2a630182a2f77cb92c3de44f50ac7002174fe71811d387eab633746c592`
MD5	`61bbf196241dd54d5b0f3c9032b007fe`
BLAKE2b-256	`dcfeb34631dc8d48e5939df3324c25c7572167d3bb61cff273416126126745c9`

See more details on using hashes here.

pydsmc 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PyDSMC

Docker Info

Artifact Evaluation

1. Loading/Building the Docker Image

2. Reproducing the motivation figures (Fig. 1 - 3)

a) Reproduce the results from precomputed logs/samples

b) Reproduce the results via bootstrapping

c) Reproduce the results from scratch

3. Reproducing the exemplary evaluation table (section 6.3; table 1)

a) Reproduce the results from the contained logs/samples

b) Reproduce the results from the pretrained neural agents

c) Reproduce the results from scracth

4. Running locally

a) Notebook

b) Tables

Tool Description

License

Statistical Method Selection

Figure

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes