Local-first Accelerated Prompt Stress Testing starter kit
Project description
APST Starter Kit
APST stands for Accelerated Prompt Stress Testing. It is a depth-oriented LLM safety and reliability evaluation workflow that repeatedly samples the same prompts under controlled conditions to estimate empirical failure probability under repeated inference.
This v0.1 starter kit is local-first and conference-friendly. You can run the mock demo without an API key, then swap in OpenAI, Together.ai, or an OpenAI-compatible local endpoint when you are ready to test a real model.
Quickstart
Install from PyPI after the first release:
pip install apst-starter-kit
apst init my-apst-demo
cd my-apst-demo
apst run --config configs/demo_mock.yaml
apst report --results outputs/demo_results.csv --lang both
Or run from a Git checkout:
git clone <repo-url>
cd apst-starter-kit
pip install -e .
apst run --config configs/demo_mock.yaml
apst report --results outputs/demo_results.csv --lang both
The demo writes:
outputs/demo_results.csvoutputs/demo_results.jsonoutputs/demo_results_report_both.md
What The Demo Does
The mock run:
- loads a small prompt set from
data/prompts/demo_prompts.json - repeatedly samples each prompt at two temperatures
- judges every response with the local
rulejudge mode - computes APST reliability and repeated-use risk metrics
- exports CSV and JSON result files
- generates English, Chinese, or bilingual Markdown reports
No API key or network access is required for configs/demo_mock.yaml.
Configuring Models
Use the mock provider for local demos:
models:
- name: mock-apst-model
model_id: mock-apst-model
provider: mock
judge_mode: rule
Use OpenAI with an LLM judge:
export OPENAI_API_KEY=...
apst run --config configs/openai_example.yaml
Use an OpenAI-compatible local endpoint, such as Ollama or vLLM exposing
/v1/chat/completions. APST uses the same client path for both: model, base_url, and a
placeholder api_key. With a local base_url, prompts, outputs, and labels are sent only to the
local server you configured.
models:
- model: llama3.1
base_url: http://localhost:11434/v1
api_key: local-not-needed
judge_mode: rule
The OpenAI-compatible client is installed with the base package, so Ollama and vLLM only need the local server running:
apst run --config configs/ollama_local.yaml
apst run --config configs/vllm_local.yaml
For local LLM-as-judge, point judge_model at another OpenAI-compatible local server:
judge_mode: llm
judge_model:
model: llama3.1
base_url: http://localhost:11434/v1
api_key: local-not-needed
See docs/local_model_servers.md for Ollama and vLLM server commands.
Together.ai models are also supported through the existing provider extra:
pip install -e ".[providers]"
export TOGETHER_API_KEY=...
apst run --config configs/openai_example.yaml --models meta-llama/Llama-3.3-70B-Instruct-Turbo
Judge Modes
rule: local deterministic checks for refusal, harmful operational guidance, crisis-support handling, and gibberish. Good for demos, local LLMs, and fast smoke tests.heuristic: local malformed-output check only. Good when you want a very conservative no-network sanity pass.llm: LLM-as-judge classification usingjudge_model. Good for richer audits and real model comparisons.
APST Metrics
Each prompt/model/temperature config reports:
empirical_failure_probability: observed failures divided by repeated samplesreliability:1 - empirical_failure_probabilityapst_risk_at_10: estimated chance of at least one failure across 10 independent attemptsfailure_probability_ci_lowandfailure_probability_ci_high: Wilson interval boundsfailure_mode_distribution: counts by judge label
The repeated-use estimate is intentionally simple:
Risk@N = 1 - (1 - empirical_failure_probability)^N
Reports
apst report --results outputs/demo_results.csv --lang en
apst report --results outputs/demo_results.csv --lang zh
apst report --results outputs/demo_results.csv --lang both
Use --audit-contact to put a real intake link, email address, or conference note in the report.
See docs/enterprise_audits.md for the enterprise audit handoff text.
Legacy Commands
The older llm-eval entry point and APST/AIRBench commands are still available:
llm-eval run-apst --config configs/smoke.yaml
llm-eval freeze-airbench --region us --per-l4 5 --output data/prompts/airbench_us_v1.json
Local Checks
pytest
Publishing
See docs/publishing.md for GitHub release and PyPI publishing steps. The package includes an
apst init command so PyPI users can scaffold the runnable configs, prompt files, and docs without
cloning the repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apst_starter_kit-0.1.0.tar.gz.
File metadata
- Download URL: apst_starter_kit-0.1.0.tar.gz
- Upload date:
- Size: 47.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
804e1123622829df92bf4a7c45e92399ce3f37af6095e4b6d4d72ddd7a1a6e6d
|
|
| MD5 |
8e38740a2b51f6bde230afc78be6d3a8
|
|
| BLAKE2b-256 |
6f25eb13af02ba24b5a99f2f43d06dbeb0ef17cd8c49a4d55a0ef9872e12ba16
|
File details
Details for the file apst_starter_kit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: apst_starter_kit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a92241362c7b090f70bc73c1ae7719f80fea708b6957afa5b0a7a916c641ddf
|
|
| MD5 |
768bffe064f96ec150aada622ef8ac90
|
|
| BLAKE2b-256 |
1ab6adf855149aef383fd96c33493f32c4f581c40801d0b4dd4702d8e0fa977a
|