No project description provided
Project description
Trismik SDK
Table of Contents
Overview
Trismik is a Cambridge, UK based startup offering adversarial testing for LLMs. The APIs we provide through this library allow you to call our adaptive test engine and evaluate LLMs up to 95% faster (and cheaper!) than traditional evaluation techniques.
Our adaptive testing algorithm allows to estimate the precision of the model by looking only at a small portion of a dataset. Through this library, we provide access to a number of open source datasets over several dimensions (reasoning, toxicity, tool use...) to speed up model testing in several scenarios, like foundation model training, supervised fine tuning, prompt engineering, and so on.
Quick Start
Installation
To use our API, you need to get an API key first. Please register on dashboard.trismik.com and obtain an API key.
Trismik is available via pypi. To install Trismik, run the following in your terminal (in a virtualenv, if you use one):
pip install trismik
API Key Setup
You can provide your API key in one of the following ways:
-
Environment Variable:
export TRISMIK_API_KEY="your-api-key"
-
.envFile:# .env TRISMIK_API_KEY=your-api-key
Then load it with
python-dotenv:from dotenv import load_dotenv load_dotenv()
-
Direct Initialization:
client = TrismikClient(api_key="YOUR_API_KEY")
Basic Usage
Here's the simplest way to run an adaptive test:
from trismik import TrismikClient, TrismikRunMetadata
from trismik.types import TrismikItem
# Define your item processor
def model_inference(item: TrismikItem) -> str:
# Your model inference logic here
# See examples/ folder for real-world implementations
return item.choices[0].id # Example: pick first choice
# Run the test
with TrismikClient() as client:
results = client.run(
test_id="MMLUPro2024",
project_id="your-project-id", # Get from dashboard or create with client.create_project()
experiment="my-experiment",
run_metadata=TrismikRunMetadata(
model_metadata={"name": "my-model", "provider": "local"},
test_configuration={"task_name": "MMLUPro2024"},
inference_setup={},
),
item_processor=model_inference,
)
print(f"Theta: {results.score.theta}")
print(f"Standard Error: {results.score.std_error}")
For async usage:
from trismik import TrismikAsyncClient
async with TrismikAsyncClient() as client:
results = await client.run(
test_id="MMLUPro2024",
project_id="your-project-id",
experiment="my-experiment",
run_metadata=TrismikRunMetadata(...),
item_processor=model_inference, # Can be sync or async
)
Features
Progress Reporting
Add optional progress tracking with a callback:
from tqdm.auto import tqdm
from trismik.settings import evaluation_settings
def create_progress_callback():
pbar = tqdm(total=evaluation_settings["max_iterations"], desc="Running test")
def callback(current: int, total: int):
pbar.total = total
pbar.n = current
pbar.refresh()
if current >= total:
pbar.close()
return callback
# Use it in your run
with TrismikClient() as client:
results = client.run(
# ... other parameters ...
on_progress=create_progress_callback(),
)
The library is silent by default - progress reporting is entirely optional.
Replay Functionality
Replay the exact sequence of questions from a previous run to test model stability:
with TrismikClient() as client:
# Run initial test
results = client.run(
test_id="MMLUPro2024",
project_id="your-project-id",
experiment="experiment-1",
run_metadata=metadata,
item_processor=model_inference,
)
# Replay with same questions
replay_results = client.run_replay(
previous_run_id=results.run_id,
run_metadata=new_metadata,
item_processor=model_inference,
with_responses=True, # Include individual responses
)
Examples
Complete working examples are available in the examples/ folder:
example_adaptive_test.py- Basic adaptive testing with both sync and async patterns, including replay functionalityexample_openai.py- Integration with OpenAI API modelsexample_transformers.py- Integration with Hugging Face Transformers models
To run the examples:
# Clone the repository and install with examples dependencies
git clone https://github.com/trismik/trismik-python
cd trismik-python
poetry install --with examples
# Run an example
poetry run python examples/example_adaptive_test.py --dataset-name MMLUPro2024
Interpreting Results
Theta (θ)
Our adaptive test returns several values; however, you will be interested mainly in theta. Theta ($\theta$) is our metric; it measures the ability of the model on a certain dataset, and it can be used as a proxy to approximate the original metric used on that dataset. For example, on an accuracy-based dataset, a high theta correlates with a high accuracy, and low theta correlates with low accuracy.
$\theta$ is intrinsically linked to the difficulty of the items a model can answer correctly. On a datasets where the item difficulties are balanced and evenly distributed, $\theta=0$ corresponds to a 50% chance for a model to get an answer right - in other words, to an accuracy of 50%. A negative theta means that the model will give more bad answers than good ones, while a positive theta means that the model will give more good answers than bad answers. While theta is unbounded in our implementation (i.e. $-\infty < \theta < \infty$), in practice we have that for most cases $\theta$ will take values between -3 and 3.
Compared to classical benchmark testing, $\theta$ from adaptive testing uses fewer but more informative items while avoiding noise from overly easy or difficult questions. This makes it a more efficient and stable measure, especially on very large datasets.
Other Metrics
-
Standard Deviation (
std):- A measure of the uncertainty or error in the theta estimate
- A smaller
stdindicates a more precise estimate - You should see a
stdaround or below 0.25
-
Correct Responses (
responsesCorrect):-
The number of correct answers delivered by the model
-
Important note: A higher number of correct answers does not necessarily correlate with a high theta. Our algorithm navigates the dataset to find a balance of "hard" and "easy" items for your model, so by the end of the test, it encounters a representative mix of inputs it can and cannot handle. In practice, expect responsesCorrect to be roughly half of responsesTotal.
-
-
Total Responses (
responsesTotal):- The number of items processed before reaching a stable theta.
- Expected range: 60 ≤ responses_total ≤ 150
Contributing
See CONTRIBUTING.md.
License
This library is licensed under the MIT license. See LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trismik-1.0.5.tar.gz.
File metadata
- Download URL: trismik-1.0.5.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e098f80f678e7dafabceff35e073101403a074471f994f4a34f53b51e3931c40
|
|
| MD5 |
ba9864bd1ecf9bfdf8af09aac1f1b426
|
|
| BLAKE2b-256 |
4b8c46d75e6cecfdec2768467478875e713d104f5e5cb5303d282365a9b133ff
|
Provenance
The following attestation bundles were made for trismik-1.0.5.tar.gz:
Publisher:
publish-to-pypi.yml on trismik/trismik-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trismik-1.0.5.tar.gz -
Subject digest:
e098f80f678e7dafabceff35e073101403a074471f994f4a34f53b51e3931c40 - Sigstore transparency entry: 1092436582
- Sigstore integration time:
-
Permalink:
trismik/trismik-python@3221c5cff07e346232983ace3e5bf5746ad301cc -
Branch / Tag:
refs/tags/v1.0.5 - Owner: https://github.com/trismik
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@3221c5cff07e346232983ace3e5bf5746ad301cc -
Trigger Event:
push
-
Statement type:
File details
Details for the file trismik-1.0.5-py3-none-any.whl.
File metadata
- Download URL: trismik-1.0.5-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3711150429e868c1f05c418e2a11392fd80efc4012b474725e476e8cc9581da1
|
|
| MD5 |
05065b9566c607004272f2090af1b5df
|
|
| BLAKE2b-256 |
1bf61818ca1b7835d96c4ec7e30f85dd8f9e9e7411de066b0fc29e143a5e6c75
|
Provenance
The following attestation bundles were made for trismik-1.0.5-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on trismik/trismik-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trismik-1.0.5-py3-none-any.whl -
Subject digest:
3711150429e868c1f05c418e2a11392fd80efc4012b474725e476e8cc9581da1 - Sigstore transparency entry: 1092436591
- Sigstore integration time:
-
Permalink:
trismik/trismik-python@3221c5cff07e346232983ace3e5bf5746ad301cc -
Branch / Tag:
refs/tags/v1.0.5 - Owner: https://github.com/trismik
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@3221c5cff07e346232983ace3e5bf5746ad301cc -
Trigger Event:
push
-
Statement type: