Skip to main content

Inspect and manage fakellm-assert frozen judgment snapshots from the terminal.

Project description

fakellm-cli

Inspect and manage fakellm-assert frozen judgment snapshots from the terminal. Part of the fakellm family.

fakellm-assert freezes a judge's verdict about a fuzzy assertion (satisfies("apologizes for the delay")) to .fakellm/judgments/judgments.json and replays it forever. Those frozen verdicts are artifacts you review in a git diff — and once you have more than a handful, you want to look at them, sanity-check them, and clean them up without hand-editing JSON. That's what this CLI is for.

pip install fakellm-cli

Requires Python 3.9+. fakellm-assert itself is not a hard dependency — the CLI reuses its types when they're importable but can also inspect a checked-in .fakellm/ store on a machine that only has the snapshots. Install them together with pip install fakellm-cli[assert] if you want both.

Commands

fakellm-cli list                         # every frozen verdict, with pass/fail counts
fakellm-cli list --verdict fail          # just the failures
fakellm-cli show "apologizes"            # one verdict in full: reasoning + response excerpt
fakellm-cli show aaaa1111                #   (by fingerprint prefix or criterion substring)
fakellm-cli verify                       # integrity check: schema, verdict values, key match
fakellm-cli prune --verdict fail         # preview removing all failing verdicts (dry run)
fakellm-cli prune --verdict fail --yes   #   actually remove them
fakellm-cli diff main/ feature/          # what changed between two snapshot dirs
fakellm-cli init                         # scaffold .fakellm/ and a conftest.py judge stub

Every read command takes --store PATH (pointing at either the judgments dir or the judgments.json file; default .fakellm/judgments) and --json for machine-readable output. Commands return a non-zero exit code on the condition you'd want to gate CI on: verify fails on integrity problems, diff fails when a verdict flipped pass↔fail.

What it does not do: re-judge

There is deliberately no fakellm-cli rejudge. Two reasons, both structural:

  1. The store doesn't keep enough to re-judge. A frozen record holds only a 280-character excerpt of the response, not the full text. Re-judging needs the exact response to recompute the fingerprint — and that lives in your test, not in the snapshot.
  2. Re-judging is a live model call that belongs in a reviewed run. fakellm-assert's whole point is that verdicts are produced exactly once, in an explicit pytest --fakellm-update, where a human reads the diff. A CLI that judged live would route around the one safety property the library exists to provide.

So the division of labor is: pytest --fakellm-update produces verdicts; fakellm-cli manages them. When verify or diff tells you a verdict is stale or wrong, the fix is to prune it here and re-judge in pytest there.

diff matches on criterion, not fingerprint

A fingerprint includes the response text, so the "same" assertion against a regenerated response has a different fingerprint by design. diff therefore pairs verdicts across two stores by (criterion, judge_model) so it can actually catch a pass→fail flip, rather than reporting every drifted response as an unrelated add+remove.

Typical workflow

fakellm-cli init                    # once, to scaffold
# ... write satisfies() assertions, then:
pytest --fakellm-update             # freeze verdicts (review the diff!)
fakellm-cli list                    # eyeball what got frozen
fakellm-cli verify                  # gate in CI alongside pytest
# later, when you intentionally change a prompt:
fakellm-cli prune --criterion "old wording" --yes
pytest --fakellm-update             # re-freeze

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fakellm_cli-0.1.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fakellm_cli-0.1.0-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file fakellm_cli-0.1.0.tar.gz.

File metadata

  • Download URL: fakellm_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a0d65ac6ba91b4d86d4a6fe0876a3aa083c6920c9ff48bed71095729532a9e0e
MD5 7e5700b1e3675acb6548ed65a852a97b
BLAKE2b-256 d1409a0b2466c7bc493a6a8a8df9c1df4d63b0eca68a069f82c79a7dc31a35c5

See more details on using hashes here.

File details

Details for the file fakellm_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fakellm_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e27d1f1402aec6561cf904a788fa02b1c56c59e96957e815515b7fedce42dfc
MD5 108f6658c25ec1a27964397663041115
BLAKE2b-256 b8a5fd8fefba2f97e6fbe75c6cc24717a11387f0a2b8f3c801549a2efe628f02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page