Skip to main content

Evaluation scaffold for LLM research, benchmarking, and reproducible experiment runs.

Project description

Themis

Themis is a Python package for running reproducible LLM evaluations. It gives you a typed scaffold for defining datasets, generators, parsers, metrics, judge workflows, and persistent run artifacts without forcing you into one provider or benchmark.

The published package name is themis-eval. The Python import namespace and CLI command are both themis.

Install

uv add themis-eval

Optional extras:

  • uv add "themis-eval[openai]"
  • uv add "themis-eval[vllm]" on Linux
  • uv add "themis-eval[langgraph]"
  • uv add "themis-eval[datasets]"
  • uv add "themis-eval[mongodb]"
  • uv add "themis-eval[postgres]"
  • uv sync --extra docs for local documentation builds from a repo checkout

Quick Start

from themis import evaluate
from themis.core.models import Case, Dataset

result = evaluate(
    model="builtin/demo_generator",
    data=[
        Dataset(
            dataset_id="sample",
            cases=[
                Case(
                    case_id="case-1",
                    input={"question": "2+2"},
                    expected_output={"answer": "4"},
                )
            ],
        )
    ],
    metric="builtin/exact_match",
    parser="builtin/json_identity",
)

print(result.run_id, result.status.value)

Custom Extensions

Themis is designed to be extended. You can plug in custom generators, parsers, reducers, metrics, judge models, and store backends through the Python API or config-driven workflows.

CLI

After installation, the package exposes the themis CLI:

themis quick-eval inline \
  --model builtin/demo_generator \
  --metric builtin/exact_match \
  --parser builtin/json_identity \
  --input '{"question":"2+2"}' \
  --expected-output '{"answer":"4"}'

Documentation

Build the docs locally with:

uv sync --extra docs
uv run mkdocs build --strict

Contributing

Contributor setup and release guidance live in CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

themis_eval-4.0.1.tar.gz (83.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

themis_eval-4.0.1-py3-none-any.whl (114.4 kB view details)

Uploaded Python 3

File details

Details for the file themis_eval-4.0.1.tar.gz.

File metadata

  • Download URL: themis_eval-4.0.1.tar.gz
  • Upload date:
  • Size: 83.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for themis_eval-4.0.1.tar.gz
Algorithm Hash digest
SHA256 5d459f255a1eccfa9fc612a0c767aed3b1e2c656e0af074b28754fc9fa191fcf
MD5 8baff42789d238f127dd1efe3bbd7bc3
BLAKE2b-256 b90e9f756ba55bfaebe2f137d4401256b9a38ba1e86f5cad265d51df7f85c02f

See more details on using hashes here.

Provenance

The following attestation bundles were made for themis_eval-4.0.1.tar.gz:

Publisher: pypi.yaml on Pittawat2542/themis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file themis_eval-4.0.1-py3-none-any.whl.

File metadata

  • Download URL: themis_eval-4.0.1-py3-none-any.whl
  • Upload date:
  • Size: 114.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for themis_eval-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5519034525d5dc9356ff7a84ad94236141b511885a3b1eb80ebcd2a1a6152a19
MD5 27663183c72b915ca3f299edc00e29a3
BLAKE2b-256 c95dcf71635926a4785bd105cd7cca091fd3c1fb373adb1a416b9428ed7c9d4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for themis_eval-4.0.1-py3-none-any.whl:

Publisher: pypi.yaml on Pittawat2542/themis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page