Skip to main content

Evaluation scaffold for LLM research, benchmarking, and reproducible experiment runs.

Project description

Themis

Themis is a Python package for running reproducible LLM evaluations. It gives you a typed scaffold for defining datasets, generators, parsers, metrics, judge workflows, and persistent run artifacts without forcing you into one provider or benchmark.

The published package name is themis-eval. The Python import namespace and CLI command are both themis.

Install

uv add themis-eval

Optional extras:

  • uv add "themis-eval[openai]"
  • uv add "themis-eval[vllm]" on Linux
  • uv add "themis-eval[langgraph]"
  • uv add "themis-eval[datasets]"
  • uv add "themis-eval[mongodb]"
  • uv add "themis-eval[postgres]"
  • uv sync --extra docs for local documentation builds from a repo checkout

Quick Start

from themis import evaluate
from themis.core.models import Case, Dataset

result = evaluate(
    model="builtin/demo_generator",
    data=[
        Dataset(
            dataset_id="sample",
            cases=[
                Case(
                    case_id="case-1",
                    input={"question": "2+2"},
                    expected_output={"answer": "4"},
                )
            ],
        )
    ],
    metric="builtin/exact_match",
    parser="builtin/json_identity",
)

print(result.run_id, result.status.value)

Custom Extensions

Themis is designed to be extended. You can plug in custom generators, parsers, reducers, metrics, judge models, and store backends through the Python API or config-driven workflows.

CLI

After installation, the package exposes the themis CLI:

themis quick-eval inline \
  --model builtin/demo_generator \
  --metric builtin/exact_match \
  --parser builtin/json_identity \
  --input '{"question":"2+2"}' \
  --expected-output '{"answer":"4"}'

Documentation

Build the docs locally with:

uv sync --extra docs
uv run mkdocs build --strict

Contributing

Contributor setup and release guidance live in CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

themis_eval-4.0.2.tar.gz (84.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

themis_eval-4.0.2-py3-none-any.whl (115.4 kB view details)

Uploaded Python 3

File details

Details for the file themis_eval-4.0.2.tar.gz.

File metadata

  • Download URL: themis_eval-4.0.2.tar.gz
  • Upload date:
  • Size: 84.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for themis_eval-4.0.2.tar.gz
Algorithm Hash digest
SHA256 c717bfa3bb687e072c1d608e7f6a7fa7f5a6a0d88804b6f9298eab669da2ea2a
MD5 3ef56a7eea66659ed92b7fc2bf5c2cc9
BLAKE2b-256 424e0b35b39efddb7430cc5c66fb94aaed54c78a3b563149ad7d6b8a49565212

See more details on using hashes here.

Provenance

The following attestation bundles were made for themis_eval-4.0.2.tar.gz:

Publisher: pypi.yaml on Pittawat2542/themis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file themis_eval-4.0.2-py3-none-any.whl.

File metadata

  • Download URL: themis_eval-4.0.2-py3-none-any.whl
  • Upload date:
  • Size: 115.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for themis_eval-4.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 58bc6345f3b3fb7966a7b9b9ce79674128fc66785bbb20ee87b22cb99f730445
MD5 5e49c50a61d4706d9c2df6449cce2bf7
BLAKE2b-256 b14be393ed586f14b48ef04c5720688c9d252062f4ea5b4467133a1d131d878a

See more details on using hashes here.

Provenance

The following attestation bundles were made for themis_eval-4.0.2-py3-none-any.whl:

Publisher: pypi.yaml on Pittawat2542/themis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page