Skip to main content

State-of-the-art uncertainty quantification methods for large language models.

Project description

omniuq

State-of-the-art uncertainty quantification methods for large language models.

omniuq brings together rigorous, paper-faithful implementations of methods that measure when an LLM is unsure and why.


Install

pip install omniuq

For low-VRAM setups (e.g. Phi-4 14B on a 24 GB card), enable quantization:

pip install "omniuq[quantize]"

You'll need an OpenAI API key for the clarifier and judge:

export OPENAI_API_KEY=sk-...

AU / EU

Every uncertain LLM answer has two possible causes. omniuq separates them.

Aleatoric Uncertainty (AU) — uncertainty from the input itself. The question is ambiguous, underspecified, or has multiple valid interpretations. Cannot be reduced by a stronger model.

"Who won the World Series?" — high AU. Depends on year, league, team vs. player.

Epistemic Uncertainty (EU) — uncertainty from the model's lack of knowledge. The question is clear; the model just doesn't know. Can be reduced with retrieval, fine-tuning, or a stronger model.

"What is the capital of Wakanda?" — high EU. Question is clear; the model has no real answer.

Total = AU + EU.

Methods

Method Decomposes Paper Code Reproduced Status
Spectral Uncertainty (Walha et al., AAAI 2026) AU + EU arXiv GitHub TriviaQA: AUROC 89.66% vs. paper 91.92% — Colab ✅ Available

Demo 1 — Spectral Uncertainty

Three ways to run it.

Paper-faithful

Phi-4 14B as target, GPT-4o as clarifier, GPT-4.1 as judge — exactly the paper's setup.

import os
from omniuq import (
    SpectralUncertainty,
    load_llm_model,
    load_openai_client,
)

tokenizer, model = load_llm_model("microsoft/phi-4")

clarifier = load_openai_client(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o",
)
judge = load_openai_client(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4.1",
)

uq = SpectralUncertainty(
    tokenizer, model,
    clarifier=clarifier,
    judge=judge,
)

print(uq.score("What is the capital of France?"))

Mixed: local target + OpenAI clarifier/judge

Smaller local model for sampling, GPT-4o for high-quality clarifications.

import os
from omniuq import (
    SpectralUncertainty,
    load_llm_model,
    load_openai_client,
)

tokenizer, model = load_llm_model(
    "Qwen/Qwen2.5-7B-Instruct"
)

clarifier = load_openai_client(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o",
)
judge = load_openai_client(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4.1",
)

uq = SpectralUncertainty(
    tokenizer, model,
    clarifier=clarifier,
    judge=judge,
)

print(uq.score("What is the capital of France?"))

Fully local — no API calls

Same HuggingFace model used as target, clarifier, and judge. No OpenAI key needed.

from omniuq import SpectralUncertainty, load_llm_model

tokenizer, model = load_llm_model(
    "meta-llama/Llama-3.1-8B-Instruct"
)

uq = SpectralUncertainty(
    tokenizer, model,
    clarifier=(tokenizer, model),
    judge=(tokenizer, model),
)

print(uq.score("What is the capital of France?"))

Note: smaller open models tend to produce noisier clarifications than GPT-4o, so AU scores will be less reliable in fully-local mode.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omniuq-0.2.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omniuq-0.2.0-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file omniuq-0.2.0.tar.gz.

File metadata

  • Download URL: omniuq-0.2.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for omniuq-0.2.0.tar.gz
Algorithm Hash digest
SHA256 027c1513a5967c83463a0679c08d29433a720938e5dcff31553515ff1cbfe8a6
MD5 f2524379d54945ead0c4f539cea5fa3a
BLAKE2b-256 f09abe12e49e8be2beb217a58e2d231d3c05db94c975f0d3267d634bb58da4da

See more details on using hashes here.

File details

Details for the file omniuq-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: omniuq-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for omniuq-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1964bcb20f8d808642a000613dc2e9ac4646d022fe8032e36aef4bf033b9597a
MD5 fa86cd874790bb883c3cf09d51c5134f
BLAKE2b-256 68c73c39727fa330d5766328ad6c44d90dcca62f204259130af55979730a183a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page