Skip to main content

Local-first Accelerated Prompt Stress Testing starter kit

Project description

APST Starter Kit

APST stands for Accelerated Prompt Stress Testing. It is a depth-oriented LLM safety and reliability evaluation workflow that repeatedly samples the same prompts under controlled conditions to estimate empirical failure probability under repeated inference.

This v0.1 starter kit is local-first and conference-friendly. You can run the mock demo without an API key, then swap in OpenAI, Together.ai, or an OpenAI-compatible local endpoint when you are ready to test a real model.

Quickstart

Install from PyPI after the first release:

pip install apst-starter-kit
apst init my-apst-demo
cd my-apst-demo

apst run --config configs/demo_mock.yaml
apst report --results outputs/demo_results.csv --lang both

Or run from a Git checkout:

git clone <repo-url>
cd apst-starter-kit
pip install -e .

apst run --config configs/demo_mock.yaml
apst report --results outputs/demo_results.csv --lang both

The demo writes:

  • outputs/demo_results.csv
  • outputs/demo_results.json
  • outputs/demo_results_report_both.md

What The Demo Does

The mock run:

  • loads a small prompt set from data/prompts/demo_prompts.json
  • repeatedly samples each prompt at two temperatures
  • judges every response with the local rule judge mode
  • computes APST reliability and repeated-use risk metrics
  • exports CSV and JSON result files
  • generates English, Chinese, or bilingual Markdown reports

No API key or network access is required for configs/demo_mock.yaml.

Configuring Models

Use the mock provider for local demos:

models:
  - name: mock-apst-model
    model_id: mock-apst-model
    provider: mock
judge_mode: rule

Use OpenAI with an LLM judge:

export OPENAI_API_KEY=...
apst run --config configs/openai_example.yaml

Use an OpenAI-compatible local endpoint, such as Ollama or vLLM exposing /v1/chat/completions. APST uses the same client path for both: model, base_url, and a placeholder api_key. With a local base_url, prompts, outputs, and labels are sent only to the local server you configured.

models:
  - model: llama3.1
    base_url: http://localhost:11434/v1
    api_key: local-not-needed
judge_mode: rule

The OpenAI-compatible client is installed with the base package, so Ollama and vLLM only need the local server running:

apst run --config configs/ollama_local.yaml
apst run --config configs/vllm_local.yaml

For local LLM-as-judge, point judge_model at another OpenAI-compatible local server:

judge_mode: llm
judge_model:
  model: llama3.1
  base_url: http://localhost:11434/v1
  api_key: local-not-needed

See docs/local_model_servers.md for Ollama and vLLM server commands.

Together.ai models are also supported through the existing provider extra:

pip install -e ".[providers]"
export TOGETHER_API_KEY=...
apst run --config configs/openai_example.yaml --models meta-llama/Llama-3.3-70B-Instruct-Turbo

Judge Modes

  • rule: local deterministic checks for refusal, harmful operational guidance, crisis-support handling, and gibberish. Good for demos, local LLMs, and fast smoke tests.
  • heuristic: local malformed-output check only. Good when you want a very conservative no-network sanity pass.
  • llm: LLM-as-judge classification using judge_model. Good for richer audits and real model comparisons.

APST Metrics

Each prompt/model/temperature config reports:

  • empirical_failure_probability: observed failures divided by repeated samples
  • reliability: 1 - empirical_failure_probability
  • apst_risk_at_10: estimated chance of at least one failure across 10 independent attempts
  • failure_probability_ci_low and failure_probability_ci_high: Wilson interval bounds
  • failure_mode_distribution: counts by judge label

The repeated-use estimate is intentionally simple:

Risk@N = 1 - (1 - empirical_failure_probability)^N

Reports

apst report --results outputs/demo_results.csv --lang en
apst report --results outputs/demo_results.csv --lang zh
apst report --results outputs/demo_results.csv --lang both

Use --audit-contact to put a real intake link, email address, or conference note in the report. See docs/enterprise_audits.md for the enterprise audit handoff text.

Legacy Commands

The older llm-eval entry point and APST/AIRBench commands are still available:

llm-eval run-apst --config configs/smoke.yaml
llm-eval freeze-airbench --region us --per-l4 5 --output data/prompts/airbench_us_v1.json

Local Checks

pytest

Publishing

See docs/publishing.md for GitHub release and PyPI publishing steps. The package includes an apst init command so PyPI users can scaffold the runnable configs, prompt files, and docs without cloning the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apst_starter_kit-0.1.0.tar.gz (47.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apst_starter_kit-0.1.0-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file apst_starter_kit-0.1.0.tar.gz.

File metadata

  • Download URL: apst_starter_kit-0.1.0.tar.gz
  • Upload date:
  • Size: 47.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for apst_starter_kit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 804e1123622829df92bf4a7c45e92399ce3f37af6095e4b6d4d72ddd7a1a6e6d
MD5 8e38740a2b51f6bde230afc78be6d3a8
BLAKE2b-256 6f25eb13af02ba24b5a99f2f43d06dbeb0ef17cd8c49a4d55a0ef9872e12ba16

See more details on using hashes here.

File details

Details for the file apst_starter_kit-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for apst_starter_kit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a92241362c7b090f70bc73c1ae7719f80fea708b6957afa5b0a7a916c641ddf
MD5 768bffe064f96ec150aada622ef8ac90
BLAKE2b-256 1ab6adf855149aef383fd96c33493f32c4f581c40801d0b4dd4702d8e0fa977a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page