Skip to main content

Evaluation scaffold for LLM research, benchmarking, and reproducible experiment runs.

Project description

Themis

Themis is a Python package for running reproducible LLM evaluations. It gives you a typed scaffold for defining datasets, generators, parsers, metrics, judge workflows, and persistent run artifacts without forcing you into one provider or benchmark.

The published package name is themis-eval. The Python import namespace and CLI command are both themis.

Install

uv add themis-eval

Optional extras:

  • uv add "themis-eval[openai]"
  • uv add "themis-eval[vllm]" on Linux
  • uv add "themis-eval[langgraph]"
  • uv add "themis-eval[datasets]"
  • uv add "themis-eval[mongodb]"
  • uv add "themis-eval[postgres]"
  • uv sync --extra docs for local documentation builds from a repo checkout

Quick Start

from themis import evaluate
from themis.core.models import Case, Dataset

result = evaluate(
    model="builtin/demo_generator",
    data=[
        Dataset(
            dataset_id="sample",
            cases=[
                Case(
                    case_id="case-1",
                    input={"question": "2+2"},
                    expected_output={"answer": "4"},
                )
            ],
        )
    ],
    metric="builtin/exact_match",
    parser="builtin/json_identity",
)

print(result.run_id, result.status.value)

Custom Extensions

Themis is designed to be extended. You can plug in custom generators, parsers, reducers, metrics, judge models, and store backends through the Python API or config-driven workflows.

CLI

After installation, the package exposes the themis CLI:

themis quick-eval inline \
  --model builtin/demo_generator \
  --metric builtin/exact_match \
  --parser builtin/json_identity \
  --input '{"question":"2+2"}' \
  --expected-output '{"answer":"4"}'

Documentation

Build the docs locally with:

uv sync --extra docs
uv run mkdocs build --strict

Contributing

Contributor setup and release guidance live in CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

themis_eval-4.0.0.tar.gz (80.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

themis_eval-4.0.0-py3-none-any.whl (111.1 kB view details)

Uploaded Python 3

File details

Details for the file themis_eval-4.0.0.tar.gz.

File metadata

  • Download URL: themis_eval-4.0.0.tar.gz
  • Upload date:
  • Size: 80.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for themis_eval-4.0.0.tar.gz
Algorithm Hash digest
SHA256 2539afe81e634bd966e3f50613b7c8d64ad4e12925994a65033749ae35d51219
MD5 995b87bffc17c8c3d661c83b93708500
BLAKE2b-256 88a7fb9eb8fd561d4376fb6f5ed99f7ce2ef44daf6bfd74cede203dee0c125d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for themis_eval-4.0.0.tar.gz:

Publisher: pypi.yaml on Pittawat2542/themis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file themis_eval-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: themis_eval-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 111.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for themis_eval-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b3240a8f1a6d824fe3ad8de2ec4867d1cb34390141a91215cdc81e2c1a454e2
MD5 b460a6ba053f7d51423ab5bab1f86f8d
BLAKE2b-256 d827443e243ed6f8f8dc5ad5cf46a282ee22785426e5c5824ffd0777fa870390

See more details on using hashes here.

Provenance

The following attestation bundles were made for themis_eval-4.0.0-py3-none-any.whl:

Publisher: pypi.yaml on Pittawat2542/themis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page