ASQI quality checks for AI systems
Project description
ASQI Engineer
ASQI (AI Systems Quality Index) Engineer is a comprehensive framework for systematic testing and quality assurance of AI systems. Developed from Resaro's experience bridging governance, technical and business requirements, ASQI Engineer enables rigorous evaluation of AI systems through containerized test packages, automated assessment, and durable execution workflows.
ASQI Engineer is in active development and we welcome contributors to contribute new test packages, share score cards and test plans, and help define common schemas to meet industry needs. Our initial release focuses on comprehensive chatbot testing with extensible foundations for broader AI system evaluation.
Key Features
Modular Test Execution
- Durable execution: DBOS-powered fault tolerance with automatic retry and recovery
- Concurrent testing: Parallel test execution with configurable concurrency limits
- Container isolation: Each test runs in isolated Docker containers for consistency and reproducibility
Flexible Scenario-based Testing
- Core schema definition: Specifies the underlying contract between test packages and users running tests, enabling an extensible approach to scale to new use cases and test modules
- Multi-system orchestration: Tests can coordinate multiple AI systems (target, simulator, evaluator) in complex workflows
- Flexible configuration: Test packages specify input systems and parameters that can be customised for individual use cases
Automated Assessment
- Structured reporting: JSON output with detailed metrics and assessment outcomes
- Configurable score cards: Define custom evaluation criteria with flexible assessment criteria
Developer Experience
- Type-safe configuration: Pydantic schemas with JSON Schema generation for IDE support
- Rich CLI interface: Typer-based commands with comprehensive help and validation
- Real-time feedback: Live progress reporting with structured logging and tracing
LLM Testing
For our first release, we have introduced the llm_api system type and contributed 4 test packages for comprehensive LLM system testing. We have also open-sourced a draft ASQI score card for customer chatbots that provides mappings between technical metrics and business-relevant assessment criteria.
LLM Test Containers
- Garak: Security vulnerability assessment with 40+ attack vectors and probes
- DeepTeam: Red teaming library for adversarial robustness testing
- TrustLLM: Comprehensive framework and benchmarks to evaluate trustworthiness of LLM systems
- Resaro Chatbot Simulator: Persona and scenario based conversational testing with multi-turn dialogue simulation
The llm_api system type uses OpenAI-compatible API interfaces. Through LiteLLM integration, ASQI Engineer provides unified access to 100+ LLM providers including OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, and custom endpoints. This standardisation enables test containers to work seamlessly across different LLM providers while supporting complex multi-system test scenarios (e.g., using different models for simulation, evaluation, and target testing).
Quick Start
Option 1: Dev Container (Recommended)
The easiest way to get started is using a dev container with all dependencies pre-configured:
-
Prerequisites:
- Docker Desktop or Docker Engine
- VS Code with Dev Containers extension
-
What's Included:
- Python 3.12+ with uv package manager
- PostgreSQL database (for DBOS durability)
- LiteLLM proxy server (for unified LLM API access)
- All development dependencies pre-installed
-
Using VS Code:
git clone <repository-url> cd asqi cp .env.example .env code . # VS Code will prompt to "Reopen in Container" - click Yes
Note that you may need to change the ports the devcontainer services (see next bullet) are running on to avoid conflicts with existing local services. Edit the host machine ports in .devcontainer/docker-compose.yml to avoid conflicts.
-
Docker Compose DevContainer Services:
- PostgreSQL:
localhost:5432(user:postgres, password:asqi, database:asqi_starter) - LiteLLM Proxy:
http://localhost:4000(OpenAI-compatible API endpoint), visit the UI withhttp://localhost:4000/ui. - Jaeger:
http://localhost:16686(Distributed tracing UI)
- PostgreSQL:
-
Verify setup:
asqi --help
Option 2: Local Development
If you prefer local development:
Prerequisites:
- Python 3.12+
- Docker (for running test containers)
- uv (Python package manager)
Installation:
-
Clone and setup:
git clone <repository-url> cd asqi uv sync --dev # Install dependencies including dev tools
-
Setup Postgres for DBOS. See
.devcontainer/docker-compose.yamlfor example configuration. -
Verify installation:
# source ./.venv/bin/activate asqi --help
Environment Configuration
ASQI supports multiple LLM providers via the llm_api Systems type through environment variables. Configure these in a .env file in the project root.
Required Environment Variables
# Copy the example file and configure your API keys
cp .env.example .env
LLM Provider API Keys:
LITELLM_MASTER_KEY=sk-1234
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
AWS_BEARER_TOKEN_BEDROCK=your-bedrock-token
BASE_URL=http://localhost:4000
API_KEY=sk-1234
How Environment Variables Work
- Systems Configuration: Systems can specify
base_urland optionally reference anenv_filefor API keys - Environment Fallbacks: If not specified, ASQI uses
BASE_URLandAPI_KEYfrom.env - Provider Keys: Specific provider keys (e.g.,
OPENAI_API_KEY) are passed to test containers
Example Systems Configuration
systems:
# Recommended: Uses env_file for API key security
direct_openai:
type: "llm_api"
params:
base_url: "https://api.openai.com/v1"
model: "gpt-4o-mini"
env_file: ".env" # References OPENAI_API_KEY from .env file
Usage
ASQI provides four main execution modes via typer subcommands:
1. Validation Mode
Validates configurations without executing tests:
asqi validate \
--test-suite-config config/suites/demo_suite.yaml \
--systems-config config/systems/demo_systems.yaml \
--manifests-dir test_containers/
2. Test Execution Only
First, build the required test container:
cd test_containers/mock_tester
docker build -t my-registry/mock_tester:latest .
cd ../..
Then run tests without score card evaluation:
asqi execute-tests \
--test-suite-config config/suites/demo_suite.yaml \
--systems-config config/systems/demo_systems.yaml \
--output-file results.json
# Or with short flags:
asqi execute-tests -t config/suites/demo_suite.yaml -s config/systems/demo_systems.yaml -o results.json
3. Score Card Evaluation Only
Evaluates existing test results against score card criteria:
asqi evaluate-score-cards \
--input-file results.json \
--score-card-config config/score_cards/example_score_card.yaml \
--output-file results_with_score_card.json
# Or with short flags:
asqi evaluate-score-cards --input-file results.json -r config/score_cards/example_score_card.yaml -o results_with_score_card.json
4. End-to-End Execution
Combines test execution and score card evaluation:
asqi execute \
--test-suite-config config/suites/demo_suite.yaml \
--systems-config config/systems/demo_systems.yaml \
--score-card-config config/score_cards/example_score_card.yaml \
--output-file results_with_score_card.json
# Or with short flags:
asqi execute -t config/suites/demo_suite.yaml -s config/systems/demo_systems.yaml -r config/score_cards/example_score_card.yaml -o results_with_score_card.json
Architecture
Core Components
- Main Entry Point (
src/asqi/main.py): CLI interface using typer for subcommands - Workflow System (
src/asqi/workflow.py): DBOS-based durable execution with fault tolerance - Container Manager (
src/asqi/container_manager.py): Docker integration for test containers - Score Card Engine (
src/asqi/score_card_engine.py): Configurable assessment and grading system - Configuration System (
src/asqi/schemas.py,src/asqi/config.py): Pydantic-based type-safe configs
Key Concepts
- Systems: AI systems being tested (APIs, models, etc.) defined in
config/systems/ - Test Suites: Collections of tests defined in
config/suites/ - Test Containers: Docker images in
test_containers/with embeddedmanifest.yaml - Score Cards: Assessment criteria defined in
config/score_cards/for automated grading - Manifests: Metadata describing test container capabilities and schemas
Available Test Containers
Mock Tester
Basic test container for development and validation:
cd test_containers/mock_tester
docker build -t my-registry/mock_tester:latest .
Garak Security Tester
Real-world LLM security testing:
# Requires API keys for target LLM services
export OPENAI_API_KEY="your_api_key_here"
cd test_containers/garak
docker build -t my-registry/garak:latest .
# Run security tests
asqi execute-tests \
--test-suite-config config/suites/security_test.yaml \
--systems-config config/systems/demo_systems.yaml \
--output-file garak_results.json
# Or with short flags:
asqi execute-tests -t config/suites/security_test.yaml -s config/systems/demo_systems.yaml -o garak_results.json
Score Cards
ASQI includes a simple grading engine for automated test result evaluation:
score_card_name: "Example Assessment"
indicators:
- name: "Test success requirement"
apply_to:
test_name: "run_mock_on_compatible_sut"
metric: "success"
assessment:
- { outcome: "PASS", condition: "equal_to", threshold: true }
- { outcome: "FAIL", condition: "equal_to", threshold: false }
Development
Running Tests
uv run pytest # Run all tests
uv run pytest --cov=src # Run with coverage
Adding New Test Containers
- Create directory under
test_containers/ - Add
Dockerfile,entrypoint.py, andmanifest.yaml - Ensure entrypoint accepts
--systems-paramsand--test-paramsJSON arguments - Output test results as JSON to stdout
Example manifest.yaml:
name: "my_test_framework"
version: "1.0.0"
input_systems:
- name: "system_under_test"
type: "llm_api"
required: true
output_metrics: ["success", "score"]
Building and Distribution
ASQI can be packaged and distributed as a Python wheel for easy installation and sharing.
Building the Package
# Build only wheel
uv build --wheel
This creates files in dist/:
asqi-[version]-py3-none-any.whl(wheel - binary distribution)
CLI Entry Point
The asqi command maps to src/asqi/main.py and provides all functionality:
asqi execute --test-suite-config config/suites/demo_suite.yaml --systems-config config/systems/demo_systems.yaml
Contributing
- Install development dependencies:
uv sync --dev - Run tests:
uv run pytest - Check code quality:
uv run ruff check && uv run ruff format - Run security scan:
uv run bandit -r src/
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file delme930-0.0.1.dev3-py3-none-any.whl.
File metadata
- Download URL: delme930-0.0.1.dev3-py3-none-any.whl
- Upload date:
- Size: 47.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f295b03ae40318c6506b82806bcadacf4c48ae9989ba112475a2f2d188aabc28
|
|
| MD5 |
5536a01d8afbdf8d7f2308713ba21b53
|
|
| BLAKE2b-256 |
8e9778d96e2ef536d3192403f85354fbdee9206e5b062ca0ce98cd08acea5ca0
|
Provenance
The following attestation bundles were made for delme930-0.0.1.dev3-py3-none-any.whl:
Publisher:
asqi-cd.yaml on alpe/asqi-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
delme930-0.0.1.dev3-py3-none-any.whl -
Subject digest:
f295b03ae40318c6506b82806bcadacf4c48ae9989ba112475a2f2d188aabc28 - Sigstore transparency entry: 458000985
- Sigstore integration time:
-
Permalink:
alpe/asqi-private@20b79e31b6e025b98ff274791091427ae66dad1e -
Branch / Tag:
refs/tags/v0.0.1.dev3 - Owner: https://github.com/alpe
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
asqi-cd.yaml@20b79e31b6e025b98ff274791091427ae66dad1e -
Trigger Event:
push
-
Statement type: