No project description provided

These details have not been verified by PyPI

Project description

Trismik SDK

PyPI - Version Python Version License

Overview
Quick Start
Features
- Progress Reporting
- Replay Functionality
Examples
Interpreting Results
- Theta (θ)
- Other Metrics
Contributing
License

Overview

Trismik is a Cambridge, UK based startup offering adversarial testing for LLMs. The APIs we provide through this library allow you to call our adaptive test engine and evaluate LLMs up to 95% faster (and cheaper!) than traditional evaluation techniques.

Our adaptive testing algorithm allows to estimate the precision of the model by looking only at a small portion of a dataset. Through this library, we provide access to a number of open source datasets over several dimensions (reasoning, toxicity, tool use...) to speed up model testing in several scenarios, like foundation model training, supervised fine tuning, prompt engineering, and so on.

Quick Start

Installation

To use our API, you need to get an API key first. Please register on dashboard.trismik.com and obtain an API key.

Trismik is available via pypi. To install Trismik, run the following in your terminal (in a virtualenv, if you use one):

pip install trismik

API Key Setup

You can provide your API key in one of the following ways:

Environment Variable:
```
export TRISMIK_API_KEY="your-api-key"
```

.env File:

# .env
TRISMIK_API_KEY=your-api-key

Then load it with python-dotenv:

from dotenv import load_dotenv
load_dotenv()

Direct Initialization:

client = TrismikClient(api_key="YOUR_API_KEY")

Basic Usage

Here's the simplest way to run an adaptive test:

from trismik import TrismikClient, TrismikRunMetadata
from trismik.types import TrismikItem

# Define your item processor
def model_inference(item: TrismikItem) -> str:
    # Your model inference logic here
    # See examples/ folder for real-world implementations
    return item.choices[0].id  # Example: pick first choice

# Run the test
with TrismikClient() as client:
    results = client.run(
        test_id="MMLUPro2024",
        project_id="your-project-id",  # Get from dashboard or create with client.create_project()
        experiment="my-experiment",
        run_metadata=TrismikRunMetadata(
            model_metadata={"name": "my-model", "provider": "local"},
            test_configuration={"task_name": "MMLUPro2024"},
            inference_setup={},
        ),
        item_processor=model_inference,
    )

    print(f"Theta: {results.score.theta}")
    print(f"Standard Error: {results.score.std_error}")

For async usage:

from trismik import TrismikAsyncClient

async with TrismikAsyncClient() as client:
    results = await client.run(
        test_id="MMLUPro2024",
        project_id="your-project-id",
        experiment="my-experiment",
        run_metadata=TrismikRunMetadata(...),
        item_processor=model_inference,  # Can be sync or async
    )

Features

Progress Reporting

Add optional progress tracking with a callback:

from tqdm.auto import tqdm
from trismik.settings import evaluation_settings

def create_progress_callback():
    pbar = tqdm(total=evaluation_settings["max_iterations"], desc="Running test")

    def callback(current: int, total: int):
        pbar.total = total
        pbar.n = current
        pbar.refresh()
        if current >= total:
            pbar.close()

    return callback

# Use it in your run
with TrismikClient() as client:
    results = client.run(
        # ... other parameters ...
        on_progress=create_progress_callback(),
    )

The library is silent by default - progress reporting is entirely optional.

Replay Functionality

Replay the exact sequence of questions from a previous run to test model stability:

with TrismikClient() as client:
    # Run initial test
    results = client.run(
        test_id="MMLUPro2024",
        project_id="your-project-id",
        experiment="experiment-1",
        run_metadata=metadata,
        item_processor=model_inference,
    )

    # Replay with same questions
    replay_results = client.run_replay(
        previous_run_id=results.run_id,
        run_metadata=new_metadata,
        item_processor=model_inference,
        with_responses=True,  # Include individual responses
    )

Examples

Complete working examples are available in the examples/ folder:

example_adaptive_test.py - Basic adaptive testing with both sync and async patterns, including replay functionality
example_openai.py - Integration with OpenAI API models
example_transformers.py - Integration with Hugging Face Transformers models

To run the examples:

# Clone the repository and install with examples dependencies
git clone https://github.com/trismik/trismik-python
cd trismik-python
poetry install --with examples

# Run an example
poetry run python examples/example_adaptive_test.py --dataset-name MMLUPro2024

Interpreting Results

Theta (θ)

Our adaptive test returns several values; however, you will be interested mainly in theta. Theta ($\theta$) is our metric; it measures the ability of the model on a certain dataset, and it can be used as a proxy to approximate the original metric used on that dataset. For example, on an accuracy-based dataset, a high theta correlates with a high accuracy, and low theta correlates with low accuracy.

$\theta$ is intrinsically linked to the difficulty of the items a model can answer correctly. On a datasets where the item difficulties are balanced and evenly distributed, $\theta=0$ corresponds to a 50% chance for a model to get an answer right - in other words, to an accuracy of 50%. A negative theta means that the model will give more bad answers than good ones, while a positive theta means that the model will give more good answers than bad answers. While theta is unbounded in our implementation (i.e. $-\infty < \theta < \infty$), in practice we have that for most cases $\theta$ will take values between -3 and 3.

Compared to classical benchmark testing, $\theta$ from adaptive testing uses fewer but more informative items while avoiding noise from overly easy or difficult questions. This makes it a more efficient and stable measure, especially on very large datasets.

Other Metrics

Standard Deviation (std):
- A measure of the uncertainty or error in the theta estimate
- A smaller std indicates a more precise estimate
- You should see a std around or below 0.25
Correct Responses (responsesCorrect):
- The number of correct answers delivered by the model
- Important note: A higher number of correct answers does not necessarily correlate with a high theta. Our algorithm navigates the dataset to find a balance of "hard" and "easy" items for your model, so by the end of the test, it encounters a representative mix of inputs it can and cannot handle. In practice, expect responsesCorrect to be roughly half of responsesTotal.
Total Responses (responsesTotal):
- The number of items processed before reaching a stable theta.
- Expected range: 60 ≤ responses_total ≤ 150

Contributing

See CONTRIBUTING.md.

License

This library is licensed under the MIT license. See LICENSE file.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.6

Jun 3, 2026

1.0.5

Mar 12, 2026

1.0.4

Feb 23, 2026

1.0.3

Feb 2, 2026

1.0.2

Oct 30, 2025

1.0.1

Oct 9, 2025

1.0.0

Oct 2, 2025

0.9.12

Sep 24, 2025

0.9.11

Sep 24, 2025

0.9.9

Sep 23, 2025

0.9.8

Sep 17, 2025

0.9.6

Sep 16, 2025

0.9.5

Sep 16, 2025

0.9.4

Sep 11, 2025

0.9.1

Apr 22, 2025

0.9.0

Sep 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trismik-1.0.6.tar.gz (41.0 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trismik-1.0.6-py3-none-any.whl (45.7 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file trismik-1.0.6.tar.gz.

File metadata

Download URL: trismik-1.0.6.tar.gz
Upload date: Jun 3, 2026
Size: 41.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trismik-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`f8bef4bcc39ca5d480fb6bd88c8598042c52123febb3540b39c5cba8bab20175`
MD5	`b1003a0ea706ab3c1cc26b1f1cb70092`
BLAKE2b-256	`e80a14661bc68b7c2b0ca6b651912ad5cb3464db070ecf1dab56a310ecfeb628`

See more details on using hashes here.

Provenance

The following attestation bundles were made for trismik-1.0.6.tar.gz:

Publisher: publish-to-pypi.yml on trismik/trismik-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: trismik-1.0.6.tar.gz
- Subject digest: f8bef4bcc39ca5d480fb6bd88c8598042c52123febb3540b39c5cba8bab20175
- Sigstore transparency entry: 1708956227
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: trismik/trismik-python@cf9380e7c87a008ab22a2d953f7bdf266b84a989
- Branch / Tag: refs/tags/v1.0.6
- Owner: https://github.com/trismik
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@cf9380e7c87a008ab22a2d953f7bdf266b84a989
- Trigger Event: push

File details

Details for the file trismik-1.0.6-py3-none-any.whl.

File metadata

Download URL: trismik-1.0.6-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 45.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trismik-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`86d27cd4a8d943b5b4785ff76ab6aead00b1b618aeb45d81aebe478807ec70b7`
MD5	`575b08a31cde50ade45f2d23d457efda`
BLAKE2b-256	`4db421b2babaedc6b86dd570240a7f7de841be5130ebc8e53bff9643d61e8097`

See more details on using hashes here.

Provenance

The following attestation bundles were made for trismik-1.0.6-py3-none-any.whl:

Publisher: publish-to-pypi.yml on trismik/trismik-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: trismik-1.0.6-py3-none-any.whl
- Subject digest: 86d27cd4a8d943b5b4785ff76ab6aead00b1b618aeb45d81aebe478807ec70b7
- Sigstore transparency entry: 1708956256
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: trismik/trismik-python@cf9380e7c87a008ab22a2d953f7bdf266b84a989
- Branch / Tag: refs/tags/v1.0.6
- Owner: https://github.com/trismik
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@cf9380e7c87a008ab22a2d953f7bdf266b84a989
- Trigger Event: push

trismik 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Trismik SDK

Table of Contents

Overview

Quick Start

Installation

API Key Setup

Basic Usage

Features

Progress Reporting

Replay Functionality

Examples

Interpreting Results

Theta (θ)

Other Metrics

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance