LLM-based gray-zone verifier for axor-core anomaly detection

These details have not been verified by PyPI

Project description

axor-classifier-llm

LLM-based gray-zone verifier for axor-core anomaly detection.

LLMAnomalyVerifier uses Anthropic Claude to evaluate ambiguous behavioral sequences that a statistical model cannot confidently resolve.

Zero required dependencies — the Anthropic SDK is an optional extra.

What it does

MLAnomalyDetector (from axor-classifier-simple) scores NormalizedIntent windows and returns a risk class. When a score falls in the gray zone — suspicious but not definitively critical — the detector can delegate to an LLMVerifier for a second opinion.

LLMAnomalyVerifier implements that protocol: it formats NormalizedIntent fields into a structured prompt, calls Claude, parses the JSON response, and returns an AnomalyResult.

Security isolation

The verifier receives only NormalizedIntent fields — behavioral abstractions describing what the agent tried to do, not what it saw or produced. Raw tool outputs, webpage content, file contents, and chain-of-thought never enter the prompt.

This design isolates the verifier from content-level prompt injection: a malicious webpage cannot influence the safety verdict by embedding instructions in its text.

Installation

pip install axor-classifier-llm[llm]

The [llm] extra installs anthropic>=0.25. Without it, the package installs with no dependencies but raises ImportError on instantiation with an actionable message.

Quick start

import anthropic
from axor_classifier_llm import LLMAnomalyVerifier

verifier = LLMAnomalyVerifier(
    client=anthropic.AsyncAnthropic(),   # must be async
    model="claude-haiku-4-5-20251001",   # default — fast and cheap
    max_tokens=256,
)

result = await verifier.verify(
    window=normalized_intents,           # list[NormalizedIntent]
    task_signal_hint="focused_mutative", # optional: helps Claude interpret context
    policy_name="focused_mutative",      # optional: helps Claude interpret context
)

print(result.score)    # float 0.0 – 1.0
print(result.cls)      # AnomalyClass.NORMAL | SUSPICIOUS | CRITICAL
print(result.reasons)  # ("secret_access_after_external_read", ...)

Integration with MLAnomalyDetector

The typical integration is as a gray_zone_verifier in MLAnomalyDetector:

import anthropic
from axor_classifier_simple import MLAnomalyDetector
from axor_classifier_llm import LLMAnomalyVerifier

verifier = LLMAnomalyVerifier(client=anthropic.AsyncAnthropic())

detector = MLAnomalyDetector(
    gray_zone_verifier=verifier,
    gray_zone_threshold=0.50,   # escalate when score >= 0.50 in suspicious range
)

result = await detector.score(window=intents)

Escalation logic in MLAnomalyDetector:

Score	Class	LLM called?
`< 0.40`	`NORMAL`	No
`[0.40, 0.50)`	`SUSPICIOUS`	No (below threshold)
`[0.50, 0.75)`	`SUSPICIOUS`	Yes — verifier result returned
`>= 0.75`	`CRITICAL`	No

If the LLM call raises any exception, MLAnomalyDetector falls back to the ML-derived score and logs a warning.

What the LLM sees

Each NormalizedIntent in the window is formatted as a single line:

tool=bash op=execute_generated_code target=workdir dest=none prov=repo flow=local_to_local [executes_generated_code, after_external_read]

The flags in brackets (reads_secret, writes_outside_workdir, etc.) appear only when set.

The system prompt instructs Claude to evaluate behavioral patterns only:

External read → secret access → outbound network → critical
Cloud metadata probe → critical
Docker socket access → critical
Unexpected tool class for stated task → suspicious
Normal coding / research patterns → normal

Claude responds with JSON only:

{
  "score": 0.82,
  "class": "critical",
  "reasons": ["secret_access_after_external_read", "network_after_secret"]
}

Malformed responses are caught: _parse_response returns score=0.5 / SUSPICIOUS / ("verifier_parse_error",) and logs the exception at WARNING level.

Parameters

Parameter	Default	Description
`client`	required	`anthropic.AsyncAnthropic` instance
`model`	`claude-haiku-4-5-20251001`	Anthropic model ID
`max_tokens`	`256`	Max tokens for the verifier response

AnomalyResult contract

Defined in axor_core.contracts.anomaly:

@dataclass
class AnomalyResult:
    score: float              # 0.0 – 1.0 risk score
    cls: AnomalyClass         # NORMAL | SUSPICIOUS | CRITICAL
    reasons: tuple[str, ...]  # human-readable trigger reasons

Score thresholds:

Class	Range
`NORMAL`	`[0.0, 0.40)`
`SUSPICIOUS`	`[0.40, 0.75)`
`CRITICAL`	`[0.75, 1.0]`

Development

git clone https://github.com/Bucha11/axor-classifier-llm
cd axor-classifier-llm
pip install -e ".[dev]"
pytest tests/

Tests mock the Anthropic client and do not make real API calls.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axor_classifier_llm-0.2.1.tar.gz (8.6 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

axor_classifier_llm-0.2.1-py3-none-any.whl (6.6 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file axor_classifier_llm-0.2.1.tar.gz.

File metadata

Download URL: axor_classifier_llm-0.2.1.tar.gz
Upload date: Jun 2, 2026
Size: 8.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axor_classifier_llm-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`4fc7aa304a5c89bff14863557c9cac5c3b137df1c7ed2380f74b2238781c99a4`
MD5	`0891873a410d2c0a14a66adcd9315260`
BLAKE2b-256	`5cad07738fbe3fa0f53cd2f8b21ab8805e2ff534a22e79be2c792b4ff488931a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for axor_classifier_llm-0.2.1.tar.gz:

Publisher: ci.yml on Bucha11/axor-classifier-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: axor_classifier_llm-0.2.1.tar.gz
- Subject digest: 4fc7aa304a5c89bff14863557c9cac5c3b137df1c7ed2380f74b2238781c99a4
- Sigstore transparency entry: 1706209714
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: Bucha11/axor-classifier-llm@89de49aeba963e98a8c3bba0814c33f3c2c553b9
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/Bucha11
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@89de49aeba963e98a8c3bba0814c33f3c2c553b9
- Trigger Event: push

File details

Details for the file axor_classifier_llm-0.2.1-py3-none-any.whl.

File metadata

Download URL: axor_classifier_llm-0.2.1-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 6.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axor_classifier_llm-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd5b7611b487857b02dbc451f63011365403afdc34efb564e84b34ca63a85425`
MD5	`73bc1249c423df288c0276fcd6be8e03`
BLAKE2b-256	`b30667a1c699df68fcb307368f4c62bfb3077f5d79432efce9fbba5045159884`

See more details on using hashes here.

Provenance

The following attestation bundles were made for axor_classifier_llm-0.2.1-py3-none-any.whl:

Publisher: ci.yml on Bucha11/axor-classifier-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: axor_classifier_llm-0.2.1-py3-none-any.whl
- Subject digest: bd5b7611b487857b02dbc451f63011365403afdc34efb564e84b34ca63a85425
- Sigstore transparency entry: 1706209746
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: Bucha11/axor-classifier-llm@89de49aeba963e98a8c3bba0814c33f3c2c553b9
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/Bucha11
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@89de49aeba963e98a8c3bba0814c33f3c2c553b9
- Trigger Event: push

axor-classifier-llm 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

axor-classifier-llm

What it does

Security isolation

Installation

Quick start

Integration with MLAnomalyDetector

What the LLM sees

Parameters

AnomalyResult contract

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance