Uncertainty estimation for open-source generative models
Project description
Klarity
Toolkit for LLM behavior analysis & uncertainty mitigation
🐳 Now with reasoning model support to analyse CoTs entropy and improve RL datasets
🎯 Overview
Klarity is a toolkit for inspecting and debugging AI decision-making processes. By combining uncertainty analysis with reasoning insights, it helps you understand how models think and fix issues before they reach production.
- Dual Entropy Analysis: Measure model confidence through raw entropy and semantic similarity metrics
- Reasoning Analysis: Extract and evaluate step-by-step thinking patterns in model outputs
- Semantic Clustering: Group similar predictions to reveal decision-making pathways
- Structured Insights: Get detailed JSON analysis of both uncertainty patterns and reasoning steps
- AI-powered Report: Leverage capable models to interpret generation patterns and provide human-readable insights
Reasoning Analysis Example - Understanding model's step-by-step thinking process
Entropy Analysis Example - Analyzing token-level uncertainty patterns
🚀 Quick Start Hugging Face
Install directly from GitHub:
pip install git+https://github.com/klara-research/klarity.git
📝 Reasoning LLM Usage Example
For insights and uncertainty analytics into model reasoning patterns, you can use the ReasoningAnalyzer:
from klarity.core.analyzer import ReasoningAnalyzer
# Create estimator with reasoning analyzer
estimator = UncertaintyEstimator(
top_k=100,
analyzer=ReasoningAnalyzer(
min_token_prob=0.01,
insight_model="together:meta-llama/Llama-3.3-70B-Instruct-Turbo",
insight_api_key="your_api_key",
reasoning_start_token="<think>", # You can change this if you have different reasoning tokens
reasoning_end_token="</think>"
)
)
# Generate with reasoning analysis
prompt = "Your prompt <think>"
inputs = tokenizer(prompt, return_tensors="pt")
generation_output = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.6,
logits_processor=LogitsProcessorList([uncertainty_processor]),
return_dict_in_generate=True,
output_scores=True,
)
result = estimator.analyze_generation(
generation_output,
tokenizer,
uncertainty_processor,
prompt
)
# Print reasoning analysis
print("\nReasoning Analysis:")
if result.overall_insight and "reasoning_analysis" in result.overall_insight:
analysis = result.overall_insight["reasoning_analysis"]
for step in analysis["steps"]:
print(f"\nStep {step['step_number']}:")
print(f"Content: {step['step_info']['content']}")
if 'analysis' in step:
step_analysis = step['analysis']['training_insights']
print("\nQuality Metrics:")
for metric, score in step_analysis['step_quality'].items():
print(f" {metric}: {score}")
📝 Standard LLM Usage Example
To prevent most of common uncertainty scenarios and route to better models you can use our EntropyAnalyzer
from transformers import AutoModelForCausalLM, AutoTokenizer, LogitsProcessorList
from klarity import UncertaintyEstimator
from klarity.core.analyzer import EntropyAnalyzer
# Initialize your model
model_name = "Qwen/Qwen2.5-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create estimator
estimator = UncertaintyEstimator(
top_k=100,
analyzer=EntropyAnalyzer(
min_token_prob=0.01,
insight_model=model,
insight_tokenizer=tokenizer
)
)
uncertainty_processor = estimator.get_logits_processor()
# Set up generation
prompt = "Your prompt"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate with uncertainty analysis
generation_output = model.generate(
**inputs,
max_new_tokens=20,
temperature=0.7,
top_p=0.9,
logits_processor=LogitsProcessorList([uncertainty_processor]),
return_dict_in_generate=True,
output_scores=True,
)
# Analyze the generation
result = estimator.analyze_generation(
generation_output,
tokenizer,
uncertainty_processor
)
generated_text = tokenizer.decode(generation_output.sequences[0], skip_special_tokens=True)
# Inspect results
print(f"\nPrompt: {prompt}")
print(f"Generated text: {generated_text}")
print("\nDetailed Token Analysis:")
for idx, metrics in enumerate(result.token_metrics):
print(f"\nStep {idx}:")
print(f"Raw entropy: {metrics.raw_entropy:.4f}")
print(f"Semantic entropy: {metrics.semantic_entropy:.4f}")
print("Top 3 predictions:")
for i, pred in enumerate(metrics.token_predictions[:3], 1):
print(f" {i}. {pred.token} (prob: {pred.probability:.4f})")
# Show comprehensive insight
print("\nComprehensive Analysis:")
print(result.overall_insight)
📊 Analysis Output
Klarity provides two types of analysis output:
Reasoning Analysis
You'll get detailed insights into the model's reasoning process:
{
"reasoning_analysis": {
"steps": [
{
"step_number": 1,
"step_info": {
"content": "Step reasoning content",
"type": "analysis"
},
"analysis": {
"training_insights": {
"step_quality": {
"coherence": "0.8",
"relevance": "0.9",
"confidence": "0.7"
},
"improvement_targets": [
{
"aspect": "conciseness",
"importance": "0.8",
"current_issue": "verbose response",
"training_suggestion": "reduce explanation steps"
}
]
}
}
}
]
}
}
Entropy Analysis
For standard language models you will get a general uncertainty report:
{
"scores": {
"overall_uncertainty": "<0-1>",
"confidence_score": "<0-1>",
"hallucination_risk": "<0-1>"
},
"uncertainty_analysis": {
"high_uncertainty_parts": [
{
"text": "",
"why": ""
}
],
"main_issues": [
{
"issue": "",
"evidence": ""
}
],
"key_suggestions": [
{
"what": "",
"how": ""
}
]
}
}
🤖 Supported Frameworks & Models
Model Frameworks
Currently supported:
-
✅ Hugging Face Transformers -> Full uncertainty analysis with raw and semantic entropy metrics
-
✅ Together AI -> Uncertainty analysis with raw log prob. metrics
Planned support:
- ⏳ PyTorch
Analysis Model (for the insights) Frameworks
Currently supported:
- ✅ Hugging Face Transformers
- ✅ Together AI API
Planned support:
- ⏳ PyTorch
Tested Target Models
| Model | Type | Status | Notes |
|---|---|---|---|
| Qwen2.5-0.5B | Base | ✅ Tested | Full Support |
| Qwen2.5-0.5B-Instruct | Instruct | ✅ Tested | Full Support |
| Qwen2.5-7B | Base | ✅ Tested | Full Support |
| Qwen2.5-7B-Instruct | Instruct | ✅ Tested | Full Support |
| Llama-3.2-3B-Instruct | Instruct | ✅ Tested | Full Support |
| DeepSeek-R1-Distill-Qwen-1.5B | Reasoning | ✅ Tested | Together API Insights |
| DeepSeek-R1-Distill-Qwen-7B | Reasoning | ✅ Tested | Together API Insights |
Analysis Models
| Model | Type | Status | JSON Reliability | Notes |
|---|---|---|---|---|
| Qwen2.5-0.5B-Instruct | Instruct | ✅ Tested | ⚡ Low | Consistently output unstructured analysis instead of JSON. Best used with structured prompting and validation. |
| Qwen2.5-7B-Instruct | Instruct | ✅ Tested | ⚠️ Moderate | Sometimes outputs well-formed JSON analysis. |
| Llama-3.3-70B-Instruct-Turbo | Instruct | ✅ Tested | ✅ High | Reliably outputs well-formed JSON analysis. Recommended for production use. |
JSON Output Reliability Guide:
- ✅ High: Consistently outputs valid JSON (>80% of responses)
- ⚠️ Moderate: Usually outputs valid JSON (50-80% of responses)
- ⚡ Low: Inconsistent JSON output (<50% of responses)
🔍 Advanced Features
Custom Analysis Configuration
You can customize the analysis parameters:
analyzer = EntropyAnalyzer(
min_token_prob=0.01, # Minimum probability threshold
semantic_similarity_threshold=0.8 # Threshold for semantic grouping
)
🤝 Contributing
Contributions are welcome! Areas we're looking to improve:
- Additional framework support
- More tested models
- Enhanced semantic analysis
- Additional analysis metrics
- Documentation and examples
Please see our Contributing Guide for details.
📝 License
Apache 2.0 License. See LICENSE for more information.
📫 Community & Support
- Website
- Discord Community for discussions & chatting
- GitHub Issues for bugs and features
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file klarity-0.1.0.tar.gz.
File metadata
- Download URL: klarity-0.1.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bdb18fccd602b44a5c554c92bf75ce7f49cee8a23262efe540a53bcdb799ed2
|
|
| MD5 |
26324aa7016afc260a01372329564d7f
|
|
| BLAKE2b-256 |
0ced32770e090011edfa5df7f3f85d4e16b049bceda3d85f2d1ead52e9445b5a
|
Provenance
The following attestation bundles were made for klarity-0.1.0.tar.gz:
Publisher:
python-publish.yml on klara-research/klarity
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
klarity-0.1.0.tar.gz -
Subject digest:
7bdb18fccd602b44a5c554c92bf75ce7f49cee8a23262efe540a53bcdb799ed2 - Sigstore transparency entry: 169865407
- Sigstore integration time:
-
Permalink:
klara-research/klarity@a80fbfdc4fa161c28ab61b44207782d929ae641c -
Branch / Tag:
refs/tags/v0.1 - Owner: https://github.com/klara-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a80fbfdc4fa161c28ab61b44207782d929ae641c -
Trigger Event:
release
-
Statement type:
File details
Details for the file klarity-0.1.0-py3-none-any.whl.
File metadata
- Download URL: klarity-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53fc28ed14f5010577394b25853f7aef89d9f72fce619277f1b9506372c6d600
|
|
| MD5 |
ac1a73ba73e63cbcb24fc78c7e8c7dea
|
|
| BLAKE2b-256 |
090c30167498242f519ebe9d6f6b4353278689a93589d689972d6ae734d892c8
|
Provenance
The following attestation bundles were made for klarity-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on klara-research/klarity
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
klarity-0.1.0-py3-none-any.whl -
Subject digest:
53fc28ed14f5010577394b25853f7aef89d9f72fce619277f1b9506372c6d600 - Sigstore transparency entry: 169865408
- Sigstore integration time:
-
Permalink:
klara-research/klarity@a80fbfdc4fa161c28ab61b44207782d929ae641c -
Branch / Tag:
refs/tags/v0.1 - Owner: https://github.com/klara-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a80fbfdc4fa161c28ab61b44207782d929ae641c -
Trigger Event:
release
-
Statement type: