Add your description here

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chamo chicham

These details have not been verified by PyPI

Project description

Artefactual

Artefactual is a lightweight Python package for measuring model hallucination risk using entropy-based metrics. It is:

Practical: Precomputed calibration for several model families is included in src/artefactual/data and can be used by model name.
Flexible: Works with vLLM, OpenAI Chat Completions, and the OpenAI Responses API formats.
Detailed outputs: Compute both sequence-level and token-level uncertainty scores to power downstream pipelines (e.g., answer filtering, reranking, human-in-the-loop triggers).

The package provides two primary uncertainty detectors:

EPR (Entropy Production Rate): a token- and sequence-level entropy-based metric exposing raw and (optionally) calibrated probabilities.
WEPR (Weighted EPR): a calibrated, learned weighted combination of entropy contributions yielding sequence- and token-level probabilities of hallucination.

The library includes pre-computed calibration coefficients and weights for a set of popular models so data scientists can use EPR/WEPR out-of-the-box without running a calibration pipeline.

Installation

Minimal (core) install — For most users who only want to compute EPR/WEPR using the precomputed files shipped in the package:

uv sync
# or for editable development install:
uv pip install -e .

With calibration (full) install — If you plan to run the calibration pipeline or train WEPR/EPR coefficients, install the calibration extra to pull heavier ML tooling and platform-specific dependencies:

uv pip install -e '.[calibration]'
# or non-editable:
uv pip install '.[calibration]'

Note: Typical packages included in this installation method are scikit-learn (training), vllm (model generation), ray (optional distributed processing), pandas, numpy, and tqdm. Installing these may require system-level libraries or CUDA support depending on your environment.

Basic usage (sequence-level scores)

EPR example:

from artefactual.preprocessing import parse_model_outputs
from artefactual.scoring import EPR

# Use precomputed calibration (model keys are defined in the registry)
epr = EPR(pretrained_model_name_or_path="mistralai/Ministral-8B-Instruct-2410")

# Compute sequence-level calibrated probabilities (list of floats)
parsed_logprobs = parse_model_outputs(response) # extract logprobs from the output
seq_scores_epr = epr.compute(parsed_logprobs)

# Compute token-level scores (list of numpy arrays)
token_scores_epr = epr.compute_token_scores(parsed_logprobs)

print("EPR sequence scores:", seq_scores_epr)

WEPR example:

from artefactual.scoring import WEPR

# WEPR requires a weight source (model key or local weights file)
wepr = WEPR(pretrained_model_name_or_path="mistralai/Ministral-8B-Instruct-2410")

# Compute sequence-level calibrated probabilities (list of floats)
parsed_logprobs = parse_model_outputs(response)
seq_scores_wepr = wepr.compute(parsed_logprobs)

# Compute token-level scores (list of numpy arrays)
token_scores_wepr = wepr.compute_token_scores(parsed_logprobs)

print("WEPR sequence scores:", seq_scores_wepr)

In both examples, the response object can have the following structure :

# Example: using an OpenAI Responses-like structure (minimal illustrative example)
response = {
	"object": "response",
	"output": [
		{
			"content": [
				{
					"logprobs": [
						{"top_logprobs": [{"logprob": -0.1}, {"logprob": -2.3}]},
						{"top_logprobs": [{"logprob": -0.05}, {"logprob": -3.1}]}
					]
				}
			]
		}
	]
}

Notes:

EPR(pretrained_model_name_or_path=...) attempts to load calibration coefficients via artefactual.utils.io.load_calibration and will silently fall back to uncalibrated raw EPR scores if calibration is not found.
WEPR(pretrained_model_name_or_path) requires a weight source (either a known model key from the registry or a local JSON file) and will raise a ValueError if weights cannot be found.
Both EPR.compute(...) and WEPR.compute(...) return lists because the methods accept batch-style inputs (the top-level structure may contain multiple response objects). If you pass a single response object you'll receive a single-element list — index the first element (for example, seq_scores_epr[0] or seq_scores_wepr[0]) to obtain a single float probability.

Further Examples

Some examples and dummy scripts are available, such as examples/epr_usage_demo.py and examples/calibration_script.py, that demonstrate basic usage and the calibration pipeline.

Calibration logic

When possible, we strongly recommend to use calibrated detectors, so that outputs can be interpreted as probabilities. We describe below how to load existing weights, or to run the full pipeline on a new model and/or corpus.

Registry / Precomputed files

Artefactual ships a small registry which maps canonical model identifiers to precomputed JSON files. These mappings are available in src/artefactual/utils/io.py under MODEL_WEIGHT_MAP and MODEL_CALIBRATION_MAP.

You can pass one of those strings directly to EPR or WEPR constructors (e.g., EPR(pretrained_model_name_or_path="mistralai/Ministral-8B-Instruct-2410")). Under the hood the package reads src/artefactual/data/<file>.json via importlib.resources.

If you prefer to provide a custom calibration or weight file, pass a filesystem path (e.g., WEPR('/path/to/my_weights.json')). See artefactual.utils.io.load_weights and load_calibration for the exact behavior.

Advanced: Calibration pipeline (for deep usage)

The calibration pipeline in this package produces the weights_*.json and calibration_*.json files used to turn raw entropy scores into calibrated probabilities. The implemented flow (all modules live under src/artefactual/calibration) is:

Prepare a QA dataset of question/answers (e.g., web_question_qa.json) containing entries like:

{ "question": "where is roswell area 51?", "question_id": "d204f08c-fbcb-41cb-8e55-ee3879d68eea", "short_answer": "Roswell", "answer_aliases": [] }
Run the generation utility src/artefactual/calibration/outputs_entropy.py to produce a JSON dataset that includes EPR/WEPR scores for each generated answer (this JSON contains generated_answers entries with an epr_score/wepr_score field).
Use src/artefactual/calibration/rates_answers.py to have a judge LLM label each generated answer as True/False (correct/incorrect). This script produces a pandas DataFrame (or CSV) where each row contains uncertainty_score (EPR/WEPR) and judgment (the target).
Train a calibration model by running src/artefactual/calibration/train_calibration.py on the DataFrame/CSV. This fits a logistic regression mapping uncertainty scores to probabilities and saves the resulting weights JSON (intercept and coefficient(s)).
Add the produced weights_*.json or calibration_*.json to the package data registry (or point EPR/WEPR at the local file) so EPR(pretrained_model_name_or_path=...) / WEPR(...) can load the calibration when scoring.

Important notes for calibration:

The pipeline requires to use a LLM-as-a-judge, which can be chosen by the user (default is "mistralai/Ministral-8B-Instruct-2410").
WEPR training learns multiple coefficient groups (e.g., mean_rank_i and max_rank_i) while EPR calibration is a single-intercept plus mean-entropy coefficient.
See the modules under src/artefactual/calibration for implementation details and plotting utilities.

Citation

If you consider artefactual or any of its feature useful for your research, consider citing our paper, accepted for publication at ECIR 2026:

@misc{moslonka2025learnedhallucinationdetectionblackbox,
      title={Learned Hallucination Detection in Black-Box LLMs using Token-level Entropy Production Rate},
      author={Charles Moslonka and Hicham Randrianarivo and Arthur Garnier and Emmanuel Malherbe},
      year={2025},
      eprint={2509.04492},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.04492},
}

License

The use of this software is under the MIT license, with no limitation of usage, including for commercial applications.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chamo chicham

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2026.3.1

Mar 9, 2026

0.1.0

Dec 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artefactual-2026.3.1.tar.gz (89.9 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

artefactual-2026.3.1-py3-none-any.whl (34.4 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file artefactual-2026.3.1.tar.gz.

File metadata

Download URL: artefactual-2026.3.1.tar.gz
Upload date: Mar 9, 2026
Size: 89.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for artefactual-2026.3.1.tar.gz
Algorithm	Hash digest
SHA256	`109d72e0ef264f2201e9e2e94dde6d74db40c4779692999131084f9df3938975`
MD5	`2672335785b7ff2dec6f438e1e88aff6`
BLAKE2b-256	`7b059af58228a1ae2f19fc2ee004efb34ea87bee946dc4da11a89f125fce2632`

See more details on using hashes here.

Provenance

The following attestation bundles were made for artefactual-2026.3.1.tar.gz:

Publisher: release.yaml on artefactory/artefactual

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: artefactual-2026.3.1.tar.gz
- Subject digest: 109d72e0ef264f2201e9e2e94dde6d74db40c4779692999131084f9df3938975
- Sigstore transparency entry: 1066429216
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: artefactory/artefactual@efba9ec314becb1d4203ed991b8f71435923fcfb
- Branch / Tag: refs/heads/main
- Owner: https://github.com/artefactory
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@efba9ec314becb1d4203ed991b8f71435923fcfb
- Trigger Event: workflow_run

File details

Details for the file artefactual-2026.3.1-py3-none-any.whl.

File metadata

Download URL: artefactual-2026.3.1-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for artefactual-2026.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47e5715fc794dad8d561136aa8328c8755d3154ca843fd33bb5e1660b470450a`
MD5	`50d44926bfa5a3d552bb2587086cff6b`
BLAKE2b-256	`a1a21e3cbe423e73cf037e3228ca237298f9c6ab669cbce7fdcf0d270d6ad418`

See more details on using hashes here.

Provenance

The following attestation bundles were made for artefactual-2026.3.1-py3-none-any.whl:

Publisher: release.yaml on artefactory/artefactual

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: artefactual-2026.3.1-py3-none-any.whl
- Subject digest: 47e5715fc794dad8d561136aa8328c8755d3154ca843fd33bb5e1660b470450a
- Sigstore transparency entry: 1066429219
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: artefactory/artefactual@efba9ec314becb1d4203ed991b8f71435923fcfb
- Branch / Tag: refs/heads/main
- Owner: https://github.com/artefactory
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@efba9ec314becb1d4203ed991b8f71435923fcfb
- Trigger Event: workflow_run

artefactual 2026.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Artefactual

Installation

Basic usage (sequence-level scores)

EPR example:

WEPR example:

Further Examples

Calibration logic

Registry / Precomputed files

Advanced: Calibration pipeline (for deep usage)

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance