Benchmark for structured information retrieval from financial documents using graph-verifiable questions

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

knowlytix-benchmark

Benchmark for Structured Retrieval from Financial Documents. Auto-generates questions from document-graph topology and scores LLM predictions against a provably correct graph-traversal baseline. (Internally referred to as the FinStructBench module — knowlytix.benchmark.*.)

knowlytix-benchmark is one of four packages in the Geometric Memory Systems family. Use it to answer questions like "how close does my RAG pipeline get to the graph-verified ground truth on this financial report?" — with metrics that resist gaming because the ground truth comes from graph operations, not a held-out human-labeled test set.

Package: knowlytix-benchmark
License: Apache-2.0
Python: 3.12+
Status: alpha (v0.x)

Install

pip install knowlytix-benchmark

Depends on knowlytix-core (pinned ~=0.1.0). LLM-mode scoring routes through LiteLLM — set GMS_LLM_MODEL plus your provider's API key and any model works.

Quickstart — score a prediction set

import json
from importlib.resources import files

from knowlytix.benchmark import score_answer

# Smoke fixtures shipped with the wheel:
questions = json.loads((files("knowlytix.benchmark.fixtures.smoke") / "questions.json").read_text())
predictions = json.loads((files("knowlytix.benchmark.fixtures.smoke") / "predictions.json").read_text())

by_id = {p["id"]: p["answer"] for p in predictions["predictions"]}
for q in questions["questions"]:
    result = score_answer(by_id[q["id"]], q["ground_truth"])
    mark = "correct" if result.correct else "wrong"
    print(f"{q['id']}: {mark}  partial={result.partial_score:.2f}  ({result.detail})")

Quickstart — run the full benchmark

from knowlytix.benchmark import Benchmark, get_instance_path

bench = Benchmark(get_instance_path("model_validation"))
result = bench.run()          # graph-only mode (no LLM, no API key needed)
bench.print_results(result)

To evaluate an LLM against the same ground-truth graph:

from knowlytix.benchmark.llm_caller import create_client

client = create_client()       # reads GMS_LLM_MODEL_SCORER → GMS_LLM_MODEL
result = bench.run(llm_client=client)
bench.print_results(result)

CLI

benchmark --instance model_validation
benchmark --instance credit_portfolio --llm-model anthropic/claude-opus-4-6

Configuration

`FINSTRUCTBENCH_*` — scoring tolerances

Variable	Default	Meaning
`FINSTRUCTBENCH_FLOAT_TOL`	`1e-6`	Absolute tolerance for float comparisons.
`FINSTRUCTBENCH_CLOSE_THRESHOLD`	`0.01`	Relative tolerance for "close enough" financial values.
`FINSTRUCTBENCH_TUPLE_ELEMENT_TOL`	`1e-3`	Tolerance per element inside tuple answers.

`GMS_LLM_*` — LLM routing (only needed for LLM-mode scoring)

Variable	Meaning
`GMS_LLM_MODEL`	Base LiteLLM model string.
`GMS_LLM_MODEL_SCORER`	Override for scoring calls (recommended).
`GMS_LLM_TIMEOUT_SECONDS`	Per-call timeout. Default `60`.

See .env.example in the source repo for the full provider key reference.

Included benchmark instances

Five synthetic financial-domain instances ship with the wheel:

Instance	Topic
`basel_capital`	Bank capital adequacy under Basel III
`credit_portfolio`	Credit risk portfolio analysis
`fair_lending`	Fair lending compliance testing
`model_validation`	Model validation report (largest)
`stress_test`	Stress testing scenarios

All synthetic — no real institution, person, or market event is depicted.

Public API

from knowlytix.benchmark import (
    Benchmark, BenchmarkResult,
    DocumentGraph, ENMEntry, ENMKey, PhaseEncoder,
    FinStructBenchSettings,
    GeneratedQuestion, ScoreResult, score_answer,
    default_generators, get_instance_path, ingest_markdown, list_instances,
)

GeneratedQuestion is a stable contract consumed by knowlytix.harness.testing.bridge — don't rename without coordinating (see CLAUDE.md §Coding standards).

Related packages

Package	Role
`knowlytix-core`	Geometric memory engine (required runtime dep)
`knowlytix-knowledge`	Document-graph ingest + query front-end
`knowlytix-harness`	DOE-driven testing + runtime governance (consumes `GeneratedQuestion`)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wingyanlau

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.2

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

knowlytix_benchmark-0.0.2-py3-none-any.whl (3.8 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file knowlytix_benchmark-0.0.2-py3-none-any.whl.

File metadata

Download URL: knowlytix_benchmark-0.0.2-py3-none-any.whl
Upload date: May 18, 2026
Size: 3.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for knowlytix_benchmark-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d16c4fa111ccfd81fbecd0e90b83185ca2331116c77d2a8ccafefe0727eaf390`
MD5	`c59a0a8075645901579f756154668d83`
BLAKE2b-256	`3a250dad6fcd92a9f73b188dc4cbfacdf1cd83b7ae47cc085ef23586c19ad9b0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowlytix_benchmark-0.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on knowlytix/GMS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: knowlytix_benchmark-0.0.2-py3-none-any.whl
- Subject digest: d16c4fa111ccfd81fbecd0e90b83185ca2331116c77d2a8ccafefe0727eaf390
- Sigstore transparency entry: 1565585104
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: knowlytix/GMS@d3dc0ca80da49e06700ca6b3737ea1729cf06c3a
- Branch / Tag: refs/heads/pypi-stub-0.0.1-v2
- Owner: https://github.com/knowlytix
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@d3dc0ca80da49e06700ca6b3737ea1729cf06c3a
- Trigger Event: workflow_dispatch

knowlytix-benchmark 0.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

knowlytix-benchmark

Install

Quickstart — score a prediction set

Quickstart — run the full benchmark

CLI

Configuration

`FINSTRUCTBENCH_*` — scoring tolerances

`GMS_LLM_*` — LLM routing (only needed for LLM-mode scoring)

Included benchmark instances

Public API

Related packages

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance

knowlytix-benchmark 0.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

knowlytix-benchmark

Install

Quickstart — score a prediction set

Quickstart — run the full benchmark

CLI

Configuration

FINSTRUCTBENCH_* — scoring tolerances

GMS_LLM_* — LLM routing (only needed for LLM-mode scoring)

Included benchmark instances

Public API

Related packages

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance

`FINSTRUCTBENCH_*` — scoring tolerances

`GMS_LLM_*` — LLM routing (only needed for LLM-mode scoring)