GMS-Harness — provider-agnostic DOE-driven black-box testing platform for LLM agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

knowlytix-harness

Geometric Memory Systems Harness — DOE-driven, black-box agentic testing with graph-verified ground truth and Design-of-Experiments factor analysis. Provider-agnostic: swap between Anthropic, OpenAI, Bedrock, Azure, or local Ollama without touching code.

knowlytix-harness is the headline package in the Geometric Memory Systems family. Use it to turn ad-hoc "does this agent work?" evaluations into repeatable, statistically-grounded campaigns with typed verdicts, failure taxonomy, cost tracking, and release gates. Bundles the runtime-governance surface (knowlytix.harness.governance) for production-grade governed agentic systems — same install, no extra step.

Package: knowlytix-harness
License: Apache-2.0
Python: 3.12+
Status: alpha (v0.x)

Install

pip install knowlytix-harness

Pulls knowlytix-core, knowlytix-knowledge, and knowlytix-benchmark at matching ~=0.1.0 versions (lockstep releases — no version mismatches). LLM calls route through LiteLLM: one library, every provider.

Provider setup (pick one)

The same knowlytix-harness wheel runs against any supported provider. Set the right env vars and go — no code changes.

Anthropic

export ANTHROPIC_API_KEY=sk-ant-...
export GMS_LLM_MODEL=anthropic/claude-opus-4-6

OpenAI

export OPENAI_API_KEY=sk-...
export GMS_LLM_MODEL=openai/gpt-4o-mini

AWS Bedrock

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-west-2
export GMS_LLM_MODEL=bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0

Azure OpenAI

export AZURE_API_KEY=...
export AZURE_API_BASE=https://your-resource.openai.azure.com
export AZURE_API_VERSION=2024-02-15-preview
export GMS_LLM_MODEL=azure/your-deployment-name

Local Ollama (no API key)

export OLLAMA_BASE_URL=http://localhost:11434
export GMS_LLM_MODEL=ollama/llama3

Full list including Google, Mistral, Cohere, Together, and more in .env.example from the source repo.

Tutorials

Two hands-on tutorial tracks ship inside the wheel:

Track	Notebooks	Path
Testing — DOE-driven black-box testing, calibration, release gates	24	`knowlytix/harness/testing/tutorials/notebooks/`
Governance — USER_GUIDE companion exercises	27	`knowlytix/harness/governance/tutorials/notebooks/`

Install the tutorial extras (Anthropic SDK, JupyterLab, matplotlib, shap):

pip install "knowlytix-harness[tutorials]"
export ANTHROPIC_API_KEY=sk-ant-...   # tutorials call claude-sonnet-4-6 directly

Launch:

jupyter lab $(python -c "import knowlytix.harness.testing.tutorials; print(__import__('importlib.resources', fromlist=['files']).files('knowlytix.harness.testing.tutorials').joinpath('notebooks'))")
# or navigate manually to the notebooks/ path inside your site-packages

Post-install verification

After pip install knowlytix-harness, three commands confirm your stack is healthy and open the human-facing exploration notebook:

pip install jupyterlab                            # if not already installed
knowlytix-smoke                                   # 5-step key-free assertion suite
jupyter lab $(knowlytix-smoke --notebook-path)    # interactive walkthrough (requires [tutorials])

knowlytix-smoke exits 0 on a healthy install; exit 1 names which of the 5 checks failed (imports + __all__, Settings defaults, importlib.resources fixture reachability, knowlytix.benchmark.score_answer on shipped predictions, harness DOE fixture schema). The notebook is shipped as package data inside this wheel — no repo clone needed — and its --notebook-path output is an absolute, symlink-resolved filesystem path.

CLI quickstart

# 1. Verify install
knowlytix-harness --help

# 2. Smoke test against the bundled fixture (no external data, no API key needed
#    if you use a dry-run evaluator)
knowlytix-harness run --fixture doe_smoke.json --dry-run

# 3. Live run with an LLM evaluator
knowlytix-harness run --markdown report.md --factor-group query_core --n-runs 32

# Alias — knowlytix-harness and knowlytix-testing are the same entry point
knowlytix-testing run --campaign campaigns/regression.yaml

Programmatic quickstart — one DOE campaign end-to-end

import os

from gms import get_llm, ModelPurpose
from knowlytix.harness.testing import (
    DOEGMSBenchmark, DOEHarnessConfig,
    make_evaluator, HallucinationOracle,
)

config = DOEHarnessConfig(
    markdown_path="report.md",      # document under test
    factor_group="query_core",      # DOE factor group
    n_runs=32,
    enable_hallucination_testing=True,
    enable_cost_tracking=True,
)

bench = DOEGMSBenchmark(config)
bench.ingest()

# make_evaluator(target_type, target_model, client=None, harness=None)
evaluator = make_evaluator(
    target_type="llm",
    target_model=os.environ["GMS_LLM_MODEL"],
    client=get_llm(ModelPurpose.DEFAULT),
)
result = bench.run(evaluator=evaluator)

analyzer = bench.analyze(result)   # returns a DOEAnalyzer (from graphdoe)
print(analyzer.summary())          # check the DOEAnalyzer API for exact method

Configuration reference

`GMSH_*` — harness tuning

Variable	Default	Meaning
`GMSH_DOE_N_RUNS`	`32`	Runs per DOE campaign.
`GMSH_DOE_SEED`	`42`	RNG seed for run selection.
`GMSH_DOE_SLA_LATENCY_MS`	`5000`	Per-call latency SLA.
`GMSH_DOE_COST_BUDGET_USD`	`10.0`	Campaign-level USD ceiling.
`GMSH_DOE_HALLUCINATION_THRESHOLD`	`0.1`	Max tolerated hallucination rate.
`GMSH_MAX_WORKERS`	`4`	Parallel eval worker count.
`GMSH_QUESTION_TIMEOUT_S`	`60`	Per-question timeout.
`GMSH_MAX_RETRIES`	`2`	Retry count on evaluator error.
`GMSH_MAX_TURNS`	`8`	Multi-turn conversation cap.
`GMSH_TRUNCATE_RESULT_AT`	`10000`	Character cap on captured outputs.
`GMSH_STORES_DIR`	`./gms_stores`	Where ingested stores live.
`GMSH_TRACING_DIR`	`./doe_tracing_store`	Trace artifact root.
`GMSH_RUNS_DIR`	`./runs`	Run records output dir.
`GMSH_CAMPAIGNS_DIR`	`./campaigns`	Campaign manifests.
`GMSH_SESSION_STORE_PATH`	`./harness_session_store`	Session state.
`GMSH_LIVE_DASHBOARD_PORT`	`8765`	Live WebSocket dashboard port.

Twenty-one GMSH_ENABLE_* feature flags toggle optional subsystems (typed verdicts, provenance, gateway fault injection, policy engine, stateful testing, hallucination oracle, calibration, multi-agent, cross-document, disambiguation, invariance, streaming, live dashboard, and more). See the USER_GUIDE.md shipped in the wheel for the full list.

`GMS_LLM_*` — LLM routing

Variable	Meaning
`GMS_LLM_MODEL`	Base LiteLLM model string. Required unless every purpose is overridden.
`GMS_LLM_MODEL_JUDGE`	Override for judge/verifier calls.
`GMS_LLM_MODEL_GENERATOR`	Override for question-generation.
`GMS_LLM_MODEL_SCORER`	Override for scoring.
`GMS_LLM_TIMEOUT_SECONDS`	Per-call timeout. Default `60`.
`GMS_LLM_MAX_RETRIES`	Transient retries. Default `2`.
`GMS_LLM_TEMPERATURE`	Sampling temperature. Default `0.0`.

Architecture in one paragraph

knowlytix-harness decomposes "did my agent behave correctly?" into (1) document ingestion via knowlytix-knowledge → geometric memory store, (2) auto-generation of graph-verified questions via knowlytix-benchmark + geometric generators, (3) DOE factor-group sweep producing a structured run matrix, (4) typed verdict verification against provable graph traversals, (5) failure taxonomy + severity classification + cost/latency tracking, (6) release-gate decision with audit packet. Every step is provider-agnostic — the same campaign YAML runs unchanged against any supported LLM.

For runtime governance of agentic systems in production, the wheel also ships knowlytix.harness.governance (the governed harness): triple-gate tool gateway (schema + policy + plausibility), typed claim verification routed to GMS primitives, behavioral FSM contracts, governance bundle signing, runtime gates, and drift monitoring. Same wheel; same install — pip install knowlytix-harness gives you both the black-box testing and the governed-runtime surface.

Public API

The wheel ships two subpackages — black-box testing (knowlytix.harness.testing) and runtime governance (knowlytix.harness.governance):

# Black-box DOE testing (the headline product)
from knowlytix.harness.testing import (
    # Core
    DOEGMSBenchmark, DOEHarnessConfig, GMSHSettings,
    # Evaluators + judges
    LLMEvaluator, AgentEvaluator, make_evaluator, GMSJudge,
    # Oracles + taxonomy
    HallucinationOracle, SeverityClassifier, CompositeOracle,
    # Agentic testing
    ToolGateway, PolicyEngine, CampaignManager,
    # …195 symbols total in __all__
)

# Runtime governance — the governed harness
from knowlytix.harness.governance import (
    # Triple-gate tool gateway: schema validation + policy + GMS plausibility
    GovernedToolGateway,
    # Typed claim verification routed to GMS primitives
    ClaimRouter, TypedClaim,
    # Behavioral FSM contracts (advisory / recommendation / action-taking)
    BehavioralContract,
    # End-to-end orchestrator + lifecycle state machine
    GovernedOrchestrator,
    # Runtime gates + drift monitoring + bundle signing
    RuntimeGate, DriftMonitor, GovernanceBundle,
)

See the top of harness/testing/__init__.py and harness/governance/__init__.py for the full declarations or USER_GUIDE.md for task-oriented navigation.

Related packages

Package	Role
`knowlytix-core`	Geometric memory engine
`knowlytix-knowledge`	Document ingest + query front-end
`knowlytix-benchmark`	Structured-retrieval benchmark

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wingyanlau

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.2

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

knowlytix_harness-0.0.2-py3-none-any.whl (6.1 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file knowlytix_harness-0.0.2-py3-none-any.whl.

File metadata

Download URL: knowlytix_harness-0.0.2-py3-none-any.whl
Upload date: May 18, 2026
Size: 6.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for knowlytix_harness-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd96d09c82c56c521c2dcb258aa0a1468c56ee097736153a8f3307ac2da20f81`
MD5	`b08db2750249920928f39f96dc9c8e96`
BLAKE2b-256	`4f552efd0960252ab0f613c82e195da065d16f85dbb16a924d4cedbdbcbffbd8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowlytix_harness-0.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on knowlytix/GMS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: knowlytix_harness-0.0.2-py3-none-any.whl
- Subject digest: fd96d09c82c56c521c2dcb258aa0a1468c56ee097736153a8f3307ac2da20f81
- Sigstore transparency entry: 1565585344
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: knowlytix/GMS@d3dc0ca80da49e06700ca6b3737ea1729cf06c3a
- Branch / Tag: refs/heads/pypi-stub-0.0.1-v2
- Owner: https://github.com/knowlytix
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@d3dc0ca80da49e06700ca6b3737ea1729cf06c3a
- Trigger Event: workflow_dispatch

knowlytix-harness 0.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

knowlytix-harness

Install

Provider setup (pick one)

Anthropic

OpenAI

AWS Bedrock

Azure OpenAI

Local Ollama (no API key)

Tutorials

Post-install verification

CLI quickstart

Programmatic quickstart — one DOE campaign end-to-end

Configuration reference

GMSH_* — harness tuning

GMS_LLM_* — LLM routing

Architecture in one paragraph

Public API

Related packages

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance

`GMSH_*` — harness tuning

`GMS_LLM_*` — LLM routing