Skip to main content

HB-Eval SDK for reliable agent evaluation, semantic memory, and LangChain integration

Project description

HB-Eval SDK

The official Python SDK for HB-Eval OS — the reliability operating system for agentic AI. Evaluate any agent trajectory against five reliability metrics and receive a tier certification, in a few lines of code.

Install

pip install hb-eval-sdk

For the LangChain integration:

pip install hb-eval-sdk[langchain]

Quick start

from hb_eval_sdk import HBEvalClient

client = HBEvalClient(
    api_key="...",          # identifies your project
    aes_key="...",          # encrypts your payload (base64, 32 bytes)
    signing_secret="...",   # signs your request (base64; never transmitted)
)

result = client.evaluate({
    "trajectory": [
        {"step": 1, "action": "chain_start"},
        {"step": 2, "action": "tool_call", "tool": "search"},
        {"step": 3, "action": "chain_end"},
    ],
    "sub_tasks": 3,
    "constraint_violations": 0,
    "recovery_episodes": [],
    "agent_id": "my-agent",
})

print(result.verdict, result.tier)
print(result.metrics)   # pei, irs, frr, ti, csi

The five metrics

Every evaluation returns five reliability metrics. Any of them may be None when it is genuinely undefined for a given run, and None always means "not measured" — never "scored zero".

  • PEI — Planning Efficiency Index
  • IRS — Intentional Recovery Score (None when the run had no faults)
  • FRR — Failure Resilience Rate
  • TI — Traceability Index (None when no judge evaluation was made)
  • CSI — Consistency Stability Index (None without enough history)

LangChain

from hb_eval_sdk import HBEvalCallback

callback = HBEvalCallback(api_key="...", aes_key="...", signing_secret="...")
agent.run(task, callbacks=[callback])
print(callback.last_result.verdict)

The callback observes the real run — counting genuine tool errors and detecting actual fault-and-recovery patterns — rather than assuming a clean execution.

Credentials

Your project has three credentials, issued together when the project is created. The API key is sent on each request to identify you. The AES key encrypts your payload locally. The signing secret signs your request and is never transmitted — it proves the request genuinely came from you, even to an observer who has seen your API key.

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hb_eval_sdk-2.1.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hb_eval_sdk-2.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file hb_eval_sdk-2.1.0.tar.gz.

File metadata

  • Download URL: hb_eval_sdk-2.1.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for hb_eval_sdk-2.1.0.tar.gz
Algorithm Hash digest
SHA256 9c8ead5bf5fe1d269bfe31e7aa3ab76ac2be5d828794afb99aa2e1a89855aecb
MD5 3ccdde366cd45fb43eb0cf63e2720f5c
BLAKE2b-256 5a01e1af48bca5e2da64c75a4ff98d73e5dafae604fe9bb2bff5b7ad9a9cff9d

See more details on using hashes here.

File details

Details for the file hb_eval_sdk-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: hb_eval_sdk-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for hb_eval_sdk-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 646b8b3828e89b13c7031cd62c8312770fd95e87c759351ba5a1d015cda30fe0
MD5 9425ef617160b0a1ae9aa58d252d116f
BLAKE2b-256 96b1b5331f777d3361edfdc2904602655f3e8fce63b4a741f849df7b75210e1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page