Evaluate whether AI agents can create efficient, reproducible, and safe clinical trial design
Project description
trialdesignbench 
TrialDesignBench provides tooling for evaluating whether AI agents can reproduce clinical trial designs from Statistical Analysis Plans and protocols.
This baseline implements workflow step 1:
- Create a local benchmark workspace.
- Convert a SAP/protocol PDF to Mathpix Markdown, with optional LaTeX ZIP output.
- Build the standard TrialDesignBench reproduction prompt.
- Run the prompt against a locally installed Codex SDK/runtime and save the run artifacts.
Installation
uv add trialdesignbench
For development:
git clone https://github.com/BBSW-org/TrialDesignBench.git
cd TrialDesignBench
uv sync
The experimental Codex Python SDK is declared as a Git source dependency for
uv environments until it is published on PyPI. From a clone of this
repository, uv sync installs both openai-codex and its pinned local runtime.
For PyPI-only installs before openai-codex is published on PyPI, add the SDK
source explicitly in the consuming project:
uv add "openai-codex @ git+https://github.com/openai/codex.git#subdirectory=sdk/python"
Quick Start
uv run tdb init tdb-workspace
uv run tdb configure --workspace tdb-workspace
uv run tdb run path/to/sap.pdf --workspace tdb-workspace --case-id tdb-001
Use --no-codex to exercise only the Mathpix ingestion portion:
uv run tdb run path/to/sap.pdf --workspace tdb-workspace --no-codex
The workspace .env file stores MATHPIX_APP_ID, MATHPIX_APP_KEY,
CODEX_MODEL, and optionally CODEX_BIN. The default Codex model is
gpt-5.5, and the default reasoning effort is high. The generated workspace
.gitignore excludes credentials and output artifacts by default.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trialdesignbench-0.2.0.tar.gz.
File metadata
- Download URL: trialdesignbench-0.2.0.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f65af4c3caffe897bd7a516fdc4a9614589fc6c23e88da2c4b86e6fa24a7fd7
|
|
| MD5 |
690aa31221ea1a7726fa91d581d805a1
|
|
| BLAKE2b-256 |
e5f19f6a1c8c8b8b18848f1bf193a74e705be4603026d314d7cf07112bf10442
|
File details
Details for the file trialdesignbench-0.2.0-py3-none-any.whl.
File metadata
- Download URL: trialdesignbench-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
854bb28f8e80efc5ff57956cb4ccb8a3b1d5a20b0a1e09f3b3d69f47599b0b0b
|
|
| MD5 |
0fc8d1a0d9fbeadb76ae91da43a9cf65
|
|
| BLAKE2b-256 |
2cf6f2175f3216f6e45ac55069fcb1beadc9038585098c681fbea0c9684fbf1c
|