Anti-Hallucination & Token Optimization library for Groq, Gemini and local LLMs

These details have not been verified by PyPI

Project links

Project description

Hallutok

Token optimization and hallucination detection for LLM applications.

Hallutok is a Python library that wraps LLM calls with two things most production apps need but rarely have built-in: prompt compression to reduce token spend, and response scoring to catch hallucinations before they reach your users. It works with Groq, Gemini, Ollama, and HuggingFace. It also ships with a full CLI so you can use every feature directly from your terminal.

Installation
CLI
- chat
- optimize
- validate
- session
- models
- stats
Quick Start
API Providers — Groq and Gemini
Runtime Engine — Local Models
Complete Runtime Example
Components Reference
Result Objects
Roadmap

Installation

# Groq support
pip install hallutok[groq]

# Gemini support
pip install hallutok[gemini]

# Both API providers
pip install hallutok[all]

For local model support via Ollama or HuggingFace, install the additional dependencies:

pip install ollama                          # for Ollama
pip install transformers torch             # for HuggingFace

CLI

Hallutok installs a hallutok command alongside the library. Every feature — chatting, optimizing prompts, validating text, managing sessions — is available from the terminal without writing any Python.

hallutok <command> [options]

Commands:
  chat        Chat with a model (Ollama, Groq, or Gemini)
  optimize    Compress a prompt and see token savings
  validate    Score any text for hallucination risk
  session     List, inspect, and export saved sessions
  models      List available Ollama models
  stats       Show installed dependencies and system info

hallutok chat

Send a prompt to any model and get a response with token savings and hallucination analysis printed inline.

# Ollama (default — requires Ollama running locally)
hallutok chat "What are black holes?"
hallutok chat "Explain quantum computing" --model phi3 --temperature 0.3

# Groq
hallutok chat "What causes inflation?" --groq gsk_your_key
hallutok chat "Summarize this" --groq gsk_your_key --model mixtral-8x7b-32768

# Gemini
hallutok chat "Explain neural networks" --gemini AIza_your_key

# With a system prompt
hallutok chat "What is the Higgs boson?" --system "You are a physics professor."

# Continue a named session
hallutok chat "What did we discuss?" --session my-research

# Save the session after chatting
hallutok chat "Tell me about supernovae" --save session.json

# Only print the response, no analytics
hallutok chat "What is DNA?" --quiet

# Output as JSON (useful for piping to other tools)
hallutok chat "What is AI?" --json

# Read a long prompt from a file
hallutok chat --file my_prompt.txt --model llama3

# Skip optimization or validation
hallutok chat "Hello" --no-optimize --no-validate

Options:

Flag	Default	Description
`--model`, `-m`	`llama3`	Model name for Ollama
`--groq`	—	Use Groq with this API key
`--gemini`	—	Use Gemini with this API key
`--temperature`, `-t`	`0.4`	Sampling temperature
`--max-tokens`	`1024`	Max response tokens
`--system`, `-s`	—	System prompt
`--session`	—	Session name to continue
`--save`	—	Save session to JSON file after chat
`--file`, `-f`	—	Read prompt from a file
`--no-validate`	off	Skip hallucination validation
`--no-optimize`	off	Skip token optimization
`--json`	off	Output result as JSON
`--quiet`, `-q`	off	Print only the response

hallutok optimize

Compress a prompt and see exactly how many tokens were saved, before sending anything to a model.

hallutok optimize "Please note that I would like you to explain in order to help me understand what black holes are."

# With a token limit
hallutok optimize "Your long prompt here..." --max-tokens 100

# From a file
hallutok optimize --file my_prompt.txt

# JSON output
hallutok optimize "Please explain..." --json

Example output:

Original Prompt  (18 tokens)
  Please note that I would like you to explain in order to help me understand what black holes are.

Optimized Prompt  (7 tokens)
  Explain what black holes are.

Token Optimization
  Before  : 18 tokens
  After   : 7 tokens
  Saved   : 11 tokens (61.1%)

hallutok validate

Score any text for hallucination risk using the mathematical HRS scoring system. Useful for auditing model outputs or any text before publishing.

hallutok validate "I think maybe studies show that eating chocolate probably cures cancer."

# From a file
hallutok validate --file response.txt

# Just print the risk level (LOW / MEDIUM / HIGH)
hallutok validate "Some text..." --quiet

# JSON output with full score breakdown
hallutok validate "Some text..." --json

Example output:

Hallucination Analysis
  HRS Score : ████░░░░░░░░░░░ 0.612
  Risk      : HIGH
  SCS=0.410  ECS=0.700  CDS=1.000  FGS=0.900

  Flags:
  - Hedging language detected: "I think", "maybe", "probably"
  - Ungrounded claim: "studies show" without citation

  Suggestions:
  - Remove hedging phrases or support claims with citations
  - Add specific sources for statistical claims

hallutok session

List and inspect saved session files, or export them to Markdown.

# List all session JSON files in current directory
hallutok session list

# List sessions in a specific directory
hallutok session list --dir ./sessions

# Show session stats and full chat history
hallutok session show my_session.json

# Export session as a readable Markdown chat log
hallutok session export my_session.json
hallutok session export my_session.json --output chat_log.md

# Show session as JSON
hallutok session show my_session.json --json

hallutok models

List all models currently available in your local Ollama installation.

hallutok models

# Specify a different Ollama host
hallutok models --host http://192.168.1.100:11434

# JSON output
hallutok models --json

Example output:

Available Ollama Models
  MODEL                               SIZE         MODIFIED
  llama3:latest                       4823 MB      2024-06-01
  mistral:latest                      4108 MB      2024-05-20
  phi3:latest                         2301 MB      2024-05-18

hallutok stats

Show system information, installed dependencies, and quick-start examples.

hallutok stats

# JSON output
hallutok stats --json

Example output:

Version Info
  hallutok_version       : 0.2.0
  python_version         : 3.11.4
  platform               : Darwin
  architecture           : arm64

Dependencies
  groq                      : installed
  google.generativeai        : installed
  ollama                    : installed
  transformers               : not installed
  torch                     : not installed

Quick Start

from hallutok import HallutokClient

client = HallutokClient.with_groq(
    api_key="gsk_your_key",
    model="llama3-8b-8192",
    temperature=0.3,
)

result = client.chat("Explain what black holes are.")

print(result.response)
print(result.token_report)
# {'tokens_before': 12, 'tokens_after': 9, 'tokens_saved': 3, 'percent_saved': 25.0}

if result.validation.is_likely_hallucination:
    print("Flags:", result.validation.flags)

API Providers — Groq and Gemini

Groq

from hallutok import HallutokClient

client = HallutokClient.with_groq(
    api_key="gsk_your_groq_key",
    model="llama3-8b-8192",
    temperature=0.3,
    max_response_tokens=1024,
    system_prompt="You are a factual assistant. Cite sources when possible.",
)

result = client.chat(
    "Please note that I would like you to explain in order to help me "
    "understand what black holes are and how they work in detail."
)

print(result.response)
print(result.token_report)
# {'tokens_before': 34, 'tokens_after': 13, 'tokens_saved': 21, 'percent_saved': 61.8}

if result.validation.is_likely_hallucination:
    print("Risk:", result.validation.risk_level)
    print("Flags:", result.validation.flags)
    print("Suggestions:", result.validation.suggestions)

Gemini

from hallutok import HallutokClient

client = HallutokClient.with_gemini(
    api_key="AIza_your_gemini_key",
    model="gemini-1.5-flash",
    temperature=0.4,
)

result = client.chat("Explain quantum entanglement to a 10-year-old.")
print(result.response)
print(result.token_report)

Custom provider setup

from hallutok import HallutokClient
from hallutok.providers import GroqProvider, GeminiProvider

provider = GroqProvider(api_key="gsk_...", model="mixtral-8x7b-32768")
# provider = GeminiProvider(api_key="AIza_...", model="gemini-1.5-pro")

client = HallutokClient(
    provider=provider,
    optimize_tokens=True,
    validate_responses=True,
    max_prompt_tokens=512,
    temperature=0.4,
    max_response_tokens=1024,
    system_prompt="You are a factual assistant.",
    cache_enabled=True,
)

result = client.chat("What causes inflation?")

Pre-flight token estimation

Check how many tokens a prompt will use before sending it:

estimate = client.estimate_cost_tokens(
    "Please note that I would like you to in order to help me explain "
    "how machine learning works and what it does."
)
print(estimate)
# {'tokens_before': 28, 'tokens_after': 11, 'tokens_saved': 17, 'percent_saved': 60.7}

Runtime Engine — Local Models

The HallutokEngine brings the full Hallutok pipeline to local models. Load any model from Ollama or HuggingFace and get token optimization, hallucination scoring, context window management, session persistence, and latency optimization out of the box — no API key required.

Loading a model

from hallutok.runtime import HallutokEngine

# From Ollama (requires Ollama running at localhost:11434)
engine = HallutokEngine.from_ollama("llama3")

# From HuggingFace Hub
engine = HallutokEngine.from_huggingface(
    "mistralai/Mistral-7B-Instruct-v0.2",
    device="auto",    # auto-detects cuda / mps / cpu
    quantize=True,    # 4-bit quantization to reduce memory
)

# From a local model directory
engine = HallutokEngine.from_local("/path/to/model")

Engine configuration options

engine = HallutokEngine.from_ollama(
    model="llama3",
    max_tokens=4096,              # total context window token budget
    trim_strategy="sliding",      # how to handle context overflow
    kv_cache=True,                # cache identical prompts
    warm_up=True,                 # pre-warm model to cut first-call latency
    stream=False,
    system_prompt="You are a concise, factual assistant.",
)

Complete Runtime Example

This single script demonstrates every runtime feature — context management, session tracking, latency optimization, hallucination detection, export, and engine stats. Copy and run it against any Ollama model.

from hallutok.runtime import HallutokEngine

# ── 1. Load the engine ────────────────────────────────────────────────────────
engine = HallutokEngine.from_ollama(
    model="llama3",
    max_tokens=4096,
    trim_strategy="sliding",
    kv_cache=True,
    warm_up=True,
    system_prompt="You are a factual assistant. Keep answers concise.",
)

# ── 2. Create a session ───────────────────────────────────────────────────────
session = engine.create_session(
    name="demo-session",
    system_prompt="You are a factual assistant.",
    max_tokens=4096,
    trim_strategy="sliding",
)

# ── 3. Multi-turn conversation ────────────────────────────────────────────────
questions = [
    "What are black holes?",
    "Please note that I would like you to explain how Hawking radiation works.",
    "How does the event horizon relate to the singularity?",
    "What would happen to a person falling into a black hole?",
]

for question in questions:
    result = session.chat(question, temperature=0.4, max_tokens=512)

    print(f"\nQ: {question}")
    print(f"A: {result.response[:200]}...")
    print(f"   Tokens saved   : {result.tokens_saved} ({result.tokens_saved_pct}%)")
    print(f"   HRS score      : {result.hallucination_score:.3f}")
    print(f"   Risk level     : {result.hallucination_risk}")
    print(f"   Latency        : {result.latency_ms:.0f}ms")
    print(f"   Cache hit      : {result.cache_hit}")
    print(f"   Context used   : {result.context_tokens_used} / {result.context_tokens_used + result.context_tokens_available} tokens")

    if result.is_hallucination:
        print(f"   Flags          : {result.hallucination_flags}")
        print(f"   Suggestions    : {result.suggestions}")

    # Math score breakdown
    print(f"   HRS breakdown  : {result.math_scores}")

# ── 4. Flag an important turn (never trimmed from context) ────────────────────
result = session.chat(
    "Summarize everything we discussed.",
    flag_turn=True,
    temperature=0.3,
)
print(f"\nSummary: {result.response[:300]}")

# ── 5. Session analytics ──────────────────────────────────────────────────────
stats = session.get_stats()
print(f"\n--- Session Stats ---")
print(f"Total turns           : {stats['total_turns']}")
print(f"Total tokens saved    : {stats['total_tokens_saved']}")
print(f"Avg tokens saved      : {stats['avg_tokens_saved_pct']}%")
print(f"Hallucinations caught : {stats['total_hallucinations_caught']}")
print(f"Avg HRS score         : {stats['avg_hallucination_score']}")
print(f"Avg latency           : {stats['avg_latency_ms']}ms")
print(f"Session duration      : {stats['session_duration_s']}s")
print(f"Context trims         : {stats['context_trims']}")

# ── 6. Engine-wide stats ──────────────────────────────────────────────────────
engine_stats = engine.get_stats()
print(f"\n--- Engine Stats ---")
print(f"Model          : {engine_stats['model']}")
print(f"Source         : {engine_stats['source']}")
print(f"Device         : {engine_stats['device']}")
print(f"Total sessions : {engine_stats['total_sessions']}")
print(f"Uptime         : {engine_stats['uptime_s']}s")
print(f"Latency stats  : {engine_stats['latency']}")

# ── 7. Export session ─────────────────────────────────────────────────────────
session.save("my_session.json")
session.export_markdown("chat_log.md")
session.export_csv("analytics.csv")

# ── 8. Load a saved session ───────────────────────────────────────────────────
restored = engine.load_session("my_session.json")
print(f"\nRestored session: {restored.name}")
print(f"Last response: {restored.last_response()[:100]}")

# ── 9. Clear caches ───────────────────────────────────────────────────────────
engine.clear_cache()

Components Reference

HallutokClient

The main entry point for Groq and Gemini API usage.

from hallutok import HallutokClient

client = HallutokClient(
    provider=provider,
    optimize_tokens=True,       # compress prompts before sending
    validate_responses=True,    # score responses for hallucination
    max_prompt_tokens=512,      # hard cap on prompt size (None = no cap)
    temperature=0.5,
    max_response_tokens=1024,
    system_prompt=None,
    cache_enabled=True,
)

Method	Description
`chat(prompt, ...)`	Send a prompt through the full pipeline
`estimate_cost_tokens(prompt)`	Preview token savings before sending
`clear_cache()`	Flush the optimizer prompt cache
`HallutokClient.with_groq(api_key, model, **kwargs)`	Factory for Groq
`HallutokClient.with_gemini(api_key, model, **kwargs)`	Factory for Gemini

TokenOptimizer

Compresses prompts before they are sent to any model.

from hallutok.optimizer import TokenOptimizer

opt = TokenOptimizer(cache_enabled=True)

raw = """
Please note that I would like you to, in order to be helpful,
can you please explain, it is important to note that, machine learning
is a subset of AI. Machine learning is a subset of AI.
"""

compressed = opt.optimize(raw, max_tokens=100)
report = opt.savings_report(raw, compressed)
print(report)
# {'tokens_before': 54, 'tokens_after': 12, 'tokens_saved': 42, 'percent_saved': 77.8}

The optimizer applies these steps in order:

Step	What it does
Whitespace normalization	Collapses spaces, trims blank lines
Boilerplate stripping	Removes "Please note that", "I would like you to", "It is important to note", etc.
Deduplication	Removes repeated sentences
Phrase compression	"in order to" -> "to", "due to the fact that" -> "because"
Truncation	Cuts to `max_tokens` at a sentence boundary

HallucinationValidator

Scores any text for hallucination risk using the Hallucination Risk Score (HRS), a composite of four mathematical sub-scores.

from hallutok.antihallucination import HallucinationValidator

validator = HallucinationValidator()

response = "I think maybe studies show that eating chocolate probably cures cancer."
result = validator.validate(response)

print(result.confidence_score)         # 0.0–1.0, higher = more confident
print(result.risk_level)               # "LOW" | "MEDIUM" | "HIGH"
print(result.is_likely_hallucination)  # True / False
print(result.flags)                    # list of detected issues
print(result.warnings)                 # human-readable descriptions
print(result.suggestions)             # recommended actions
print(result.cleaned_response)         # response with disclaimer appended if flagged
print(result.math_scores)             # SCS, ECS, CDS, FGS, HRS breakdown

HRS scoring breakdown:

Score	Name	What it measures
SCS	Semantic Confidence Score	Hedging language ("I think", "maybe", "probably")
ECS	Evidence Consistency Score	Ungrounded claims ("Studies show", "Research suggests")
CDS	Contradiction Detection Score	Internal contradictions ("always" + "never" in same text)
FGS	Factual Grounding Score	Numeric anomalies, implausible figures
HRS	Hallucination Risk Score	Composite of all four

Detection layers:

Layer	Examples caught
Hedging	"I think", "maybe", "perhaps", "I'm not sure", "I believe"
Ungrounded claims	"Studies show", "Research suggests", "Experts say"
Numeric anomalies	Percentages over 100%, implausible statistics
Contradictions	Contradictory absolute terms in the same response

HallutokEngine

The runtime engine for local model inference with the full Hallutok pipeline.

from hallutok.runtime import HallutokEngine

# Factory methods
engine = HallutokEngine.from_ollama(model, host, **kwargs)
engine = HallutokEngine.from_huggingface(model_id, device, quantize, token, **kwargs)
engine = HallutokEngine.from_local(path, device, **kwargs)

Constructor parameters:

Parameter	Type	Default	Description
`max_tokens`	int	4096	Context window token budget
`trim_strategy`	str	"sliding"	Context overflow strategy
`kv_cache`	bool	True	Cache identical prompt responses
`warm_up`	bool	True	Pre-warm model on load
`stream`	bool	False	Enable streaming responses
`system_prompt`	str	None	Default system instruction

Methods:

Method	Description
`create_session(name, system_prompt, max_tokens, trim_strategy)`	Create a new chat session
`load_session(path, max_tokens, trim_strategy)`	Restore session from JSON
`get_stats()`	Engine-wide performance stats
`clear_cache()`	Flush KV and optimizer caches

ContextWindowManager

Manages the token budget for a conversation and automatically trims messages when the budget is exceeded.

from hallutok.runtime.context_manager import ContextWindowManager

ctx = ContextWindowManager(
    max_tokens=4096,
    trim_strategy="sliding",
    reserve_tokens=512,
)

ctx.add_message("system", "You are a helpful assistant.", flagged=True)
ctx.add_message("user", "What are black holes?")
ctx.add_message("assistant", "Black holes are regions of extremely strong gravity.")

print(ctx.stats())
# {
#   'messages': 3,
#   'total_tokens': 28,
#   'available_tokens': 3556,
#   'budget': 4096,
#   'usage_percent': 0.7,
#   'trim_count': 0,
#   'strategy': 'sliding'
# }

Trim strategies:

Strategy	Behavior
`sliding`	Keep system messages and the last N conversation turns
`drop_oldest`	Remove oldest non-system, non-flagged messages first
`summarize`	Compress older messages into an extractive summary note
`priority`	Keep system messages, flagged turns, and the last 6 messages

Messages added with flagged=True are never removed by any trim strategy.

SessionManager

Tracks conversation history, computes per-session analytics, and handles persistence and export.

from hallutok.runtime.session_manager import SessionManager
from hallutok.runtime.context_manager import ContextWindowManager

ctx = ContextWindowManager(max_tokens=4096)
session = SessionManager(name="my-session", context_manager=ctx)

Methods:

Method	Description
`record_turn(prompt, optimized_prompt, response, token_report, validation_result, latency_ms)`	Record a completed turn
`get_stats()`	Return aggregated session analytics
`save(path)`	Save session to JSON
`SessionManager.load(path, context_manager)`	Load session from JSON
`export_markdown(path)`	Export readable chat log as Markdown
`export_csv(path)`	Export per-turn analytics as CSV
`last_response()`	Return the most recent assistant response
`clear()`	Clear history and context

SessionStats fields:

stats = session.get_stats()

stats.session_name
stats.total_turns
stats.total_tokens_before
stats.total_tokens_after
stats.total_tokens_saved
stats.avg_tokens_saved_pct
stats.total_hallucinations_caught
stats.avg_hallucination_score
stats.avg_latency_ms
stats.session_duration_s
stats.context_trims

LatencyOptimizer

Manages KV caching, warm-up, and latency tracking for the runtime engine.

from hallutok.runtime.latency_optimizer import LatencyOptimizer

lat = LatencyOptimizer(
    kv_cache_enabled=True,
    kv_cache_size=64,
    stream=False,
    warm_up=True,
)

# Cache operations
lat.store_cache("What is AI?", "AI is artificial intelligence.")
cached = lat.get_cached("What is AI?")  # returns response or None

# Latency stats
print(lat.latency_stats())
# {
#   'calls': 12,
#   'avg_ms': 134.2,
#   'min_ms': 98.1,
#   'max_ms': 312.4,
#   'p95_ms': 280.0,
#   'cache_hits': 3,
#   'stream_mode': False
# }

Result Objects

ChatResult (API providers)

Returned by HallutokClient.chat().

Field	Type	Description
`response`	str	Final model response (with disclaimer if flagged)
`original_prompt`	str	The prompt as you wrote it
`optimized_prompt`	str	The prompt after token optimization
`token_report`	dict	tokens_before, tokens_after, tokens_saved, percent_saved
`validation`	ValidationResult	Full hallucination validation result
`provider`	str	"groq" or "gemini"
`warnings`	list[str]	Aggregated warnings from optimizer and validator

EngineResult (Runtime Engine)

Returned by session.chat().

Field	Type	Description
`response`	str	Final model response
`original_prompt`	str	Raw input prompt
`optimized_prompt`	str	Prompt after optimization
`tokens_before`	int	Token count before optimization
`tokens_after`	int	Token count after optimization
`tokens_saved`	int	Tokens saved
`tokens_saved_pct`	float	Percentage saved
`hallucination_score`	float	HRS composite score (0.0–1.0)
`hallucination_risk`	str	"LOW", "MEDIUM", or "HIGH"
`is_hallucination`	bool	Whether response is flagged
`hallucination_flags`	list[str]	Detected issues
`math_scores`	dict	SCS, ECS, CDS, FGS, HRS sub-scores
`latency_ms`	float	End-to-end latency in milliseconds
`cache_hit`	bool	True if served from KV cache
`context_tokens_used`	int	Tokens currently in context window
`context_tokens_available`	int	Tokens remaining in budget
`suggestions`	list[str]	Recommendations if hallucination detected

Roadmap

Token optimization pipeline
Hallucination detection with mathematical HRS scoring
Groq and Gemini provider adapters
Runtime Engine with Ollama and HuggingFace support
Context Window Manager with four trim strategies
Session Manager with history, analytics, and export
Latency Optimizer with KV cache and P95 tracking
CLI with chat, optimize, validate, session, models, and stats commands
Async support via achat()
Streaming responses
OpenAI and Together AI provider adapters
Self-consistency hallucination verification
Per-call token budget enforcement

License

MIT License — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 7, 2026

0.1.3

Jun 5, 2026

0.1.2

Jun 5, 2026

0.1.1

Jun 5, 2026

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hallutok-0.2.0.tar.gz (47.8 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hallutok-0.2.0-py3-none-any.whl (44.8 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file hallutok-0.2.0.tar.gz.

File metadata

Download URL: hallutok-0.2.0.tar.gz
Upload date: Jun 7, 2026
Size: 47.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for hallutok-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`badec17e1107c232941c1894fe264a175f0b2e739cdeeacdde1c917a72cdb1f8`
MD5	`13ca085dc4062d946a5dbc98a1fc817e`
BLAKE2b-256	`8629b3d5e31df2ee0f0449b7cc63811217639627c36efe4c3ddcfdae6adbbd2c`

See more details on using hashes here.

File details

Details for the file hallutok-0.2.0-py3-none-any.whl.

File metadata

Download URL: hallutok-0.2.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 44.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for hallutok-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a124a6fab5f52af19d6e5dd94daab9560f4c31567922b948c9057f7df20b4b78`
MD5	`db4b7c1dbebd90a0eeb588350605aebe`
BLAKE2b-256	`171282b6e32d0811ac8a72b28071cb7aeed4c844ba623d933193cf2d83e0dd2c`

See more details on using hashes here.

hallutok 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hallutok

Table of Contents

Installation

CLI

hallutok chat

hallutok optimize

hallutok validate

hallutok session

hallutok models

hallutok stats

Quick Start

API Providers — Groq and Gemini

Groq

Gemini

Custom provider setup

Pre-flight token estimation

Runtime Engine — Local Models

Loading a model

Engine configuration options

Complete Runtime Example

Components Reference

HallutokClient

TokenOptimizer

HallucinationValidator

HallutokEngine

ContextWindowManager

SessionManager

LatencyOptimizer

Result Objects

ChatResult (API providers)

EngineResult (Runtime Engine)

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes