81 projects
llm-retry-py
Exponential backoff with full jitter for LLM API calls. Sync + async. Built-in retryable-code presets for Anthropic, OpenAI, Bedrock, Gemini. Zero runtime deps.
llm-message-hash-py
Stable canonical sha256 hash of LLM request/message structures. Recursive key-sorted JSON canonicalization with per-provider presets that drop noise fields. For cache keys and idempotency. Zero runtime deps.
agentidemp-py
Idempotency keys for LLM agent retries. Deterministic content-derived keys (sha256-hex, UUIDv5, scoped). Pairs with cachebench miss-aware retry. Zero runtime deps.
tool-output-truncate-py
Truncate LLM tool output with head/tail/middle/middle_lines strategies. UTF-8 safe, zero runtime deps.
birddog
Audited Bright Data egress for AI scraping agents. Domain allowlist, per-domain rate caps, JSONL audit, Streamlit dashboard, optional Web Unlocker proxy, and optional on-close Merkle attestation via mantle-agent-attest.
geminilens
Drop-in observability for Gemini agents: traces, cost, drift, egress allowlist. Ships exporters for Arize Phoenix, Splunk HEC, Elastic, GitLab Observability, MongoDB Atlas, Dynatrace, TrueFoundry.
mantle-agent-attest
Verifiable agent-run attestations on the Mantle EVM L2. Hash an agent's JSONL audit log into a Merkle root, sign it, post it on-chain, and prove a single run later.
recruitertriage
Triage recruiter outreach with a small (<1B) language model. Built for the HuggingFace Build Small Hackathon.
token-budget-py
Thread-safe shared token + USD budget for concurrent LLM tasks. Raises BudgetExceeded on push past cap. Reserve/commit two-phase API for fan-out workloads. Zero runtime deps.
ragvitals
5-dimensional production drift detection for RAG systems.
ragdrift-py
5-dimensional drift detection for production RAG systems.
snipsplit
Token-aware text chunker for RAG ingestion. Sentence-respecting, overlap-friendly.
maskprompt
Sub-millisecond PII redaction for prompts before they reach an LLM.
bedrockstack
Low-level Python ergonomics for AWS Bedrock + Anthropic Claude: retries, cost ledger, streaming-error normalization.
toklab
Fast bulk tokenizer + token counter for OpenAI BPE encodings.
embedcache
Content-addressed local embedding cache. Skip duplicate embedding API calls.
agenttap
Wire-level prompt introspection for LLM SDK calls. See exactly what was sent, with credentials redacted by default. Anthropic, OpenAI, any httpx-based client.
llmfleet
Fleet-level batch dispatcher for LLM APIs. Pool requests across coroutines, route to provider Batch APIs, save 50% on cost without rewriting your agent loops.
bedrockcache
Audit and fix Anthropic prompt caching on AWS Bedrock through any abstraction stack.
cachebench
Prompt-cache observability for LLM APIs. Per-call hit ratios, cost saved, regression alerts, miss-aware retry. Anthropic + OpenAI + Bedrock.
bedrock-ops
Production-grade boto3 toolkit for AWS Bedrock: typed retry, per-model timeouts, capability lookup, full token usage with cache fields, PII-safe Guardrails.
embspec
Embedding pipeline ops + drift detection for production RAG: index manifests, version assertions, neighbor-stability eval, Drift-Adapter for in-place model migrations.
bedrock-kit
Small, opinionated AWS Bedrock client wrapper: adaptive throttle, cache-aware cost tracking, and structured-output parse-and-repair. Single-cloud, single-purpose.
driftvane
Compose drift detectors (embedding, retrieval, response, latency) into one report. Library-only, no server, no UI.
prompt-injection-shield-cli
CLI wrapper for prompt-injection-shield-py: scan a file or stdin for prompt-injection patterns.
ml-intern-lab
Tiny reproducible ML experiment runner.
agent-skills-playbook
Validate and render portable AI agent skills.
browser-research-agent
Research agent that turns sources into cited Markdown briefs.
personal-agent-harness
A tiny local-first personal AI agent harness.
llm-response-schema-lite-py
Tiny schema validator for structured LLM responses. Python port of @mukundakatta/llm-response-schema-lite.
prompt-version-diff-py
Diff prompt templates and flag risky instruction changes. Python port of @mukundakatta/prompt-version-diff.
prompt-token-trim-py
Trim prompt messages to fit a token budget while preserving priority. Python port of @mukundakatta/prompt-token-trim.
context-window-packer-py
Pack context chunks into a budget by relevance and priority. Python port of @mukundakatta/context-window-packer.
designlint-py
HTML/CSS accessibility and design linter: contrast, touch targets, headings, form labels, leaked secrets. Stdlib-only Python port of @mukundakatta/designlint.
context-forge-py
Context engineering toolkit for ranking, packing, and risk-scanning RAG context. Python port of @mukundakatta/context-forge.
context-drift-detector-py
Detect topic drift between user intent, retrieved context, and AI answers. Python port of @mukundakatta/context-drift-detector.
retrieval-acl-filter-py
Enforce document ACLs after retrieval and before prompting. Python port of @mukundakatta/retrieval-acl-filter.
rag-staleness-auditor-py
Find stale RAG chunks by age, version, and freshness requirements. Python port of @mukundakatta/rag-staleness-auditor.
skillint-py
Lint Claude Code SKILL.md files for frontmatter, required fields, descriptions, and hardcoded secrets. Stdlib-only Python port of @mukundakatta/skillint.
mcpcheck-py
Lint MCP config files for Claude Desktop, Claude Code, Cursor, Cline, Windsurf, and Zed. Stdlib-only Python port of @mukundakatta/mcpcheck.
kavach-py
Small, inspectable threat-scoring library for AI-app security monitoring. Zero-dep Python port of @mukundakatta/kavach.
consent-redaction-log-py
Record consent-aware redactions for privacy review trails. Zero-dep Python port of @mukundakatta/consent-redaction-log.
jailbreak-corpus-mini-py
Small local jailbreak and prompt-injection fixture set for tests. Python port of @mukundakatta/jailbreak-corpus-mini.
tool-result-taint-py
Track untrusted tool output before it enters prompts or actions. Python port of @mukundakatta/tool-result-taint.
ai-supply-chain-manifest-py
Build and validate lightweight AI model / data / tool manifests. Python port of @mukundakatta/ai-supply-chain-manifest.
model-router-policy-py
Policy-based model routing by capability, cost, latency, and privacy. Python port of @mukundakatta/model-router-policy.
model-fallback-planner-py
Plan model fallback chains from capability, cost, and health data. Python port of @mukundakatta/model-fallback-planner.
llm-trace-sampler-py
Sample LLM traces by risk, errors, latency, and deterministic ids. Python port of @mukundakatta/llm-trace-sampler.
eval-dataset-smith-py
Generate balanced AI eval fixtures from source examples, bugs, docs, and policies. Python port of @mukundakatta/eval-dataset-smith.
tool-permission-gate-py
Policy-check agent tool calls before execution. Python port of @mukundakatta/tool-permission-gate.
tool-call-contracts-py
Validate LLM tool-call payloads with small JSON-like contracts. Python port of @mukundakatta/tool-call-contracts.
agent-trajectory-replay-py
Replay and diff AI agent event trajectories for debugging regressions. Python port of @mukundakatta/agent-trajectory-replay.
agent-regression-lens-py
Detect regressions between baseline and current AI agent runs. Python port of @mukundakatta/agent-regression-lens.
agent-loop-breaker-py
Detect repeated agent steps and stop runaway loops. Python port of @mukundakatta/agent-loop-breaker.
mk-agentkit
The agent reliability stack in one install: agentfit + agentguard + agentsnap + agentvet + agentcast (Python ports).
embedding-dedupe
Deduplicate near-identical embedding records by cosine similarity. Pure Python, zero runtime deps. Python port of @mukundakatta/embedding-dedupe.
vector-poison-score
Score (query, document) pairs for vector/RAG poisoning signals: vector-text mismatch, instruction-like payloads, NaN, suspiciously round numbers. Python port of @mukundakatta/vector-poison-score.
rag-quality-kit
Heuristic quality metrics for RAG retrieval and grounded answers. Python port of @mukundakatta/rag-quality-kit.
llm-cost-guard-py
Estimate LLM request cost and enforce per-request or per-session budgets. Python port of @mukundakatta/llm-cost-guard.
eval-flake-detector
Detect flaky LLM eval cases across repeated runs. Pass-rate + standard-deviation per case, with per-case severity. Python port of @mukundakatta/eval-flake-detector.
semantic-cache-key
Stable semantic cache keys for LLM requests. Invariant to whitespace, casing, and key ordering; sensitive to model swaps, tool list, and retrieval context. Python port of @mukundakatta/semantic-cache-key.
system-prompt-leak-scan
Detect system prompt leakage in LLM model outputs via known patterns, configured-prompt substring matching, and unique fingerprint phrases. Python port of @mukundakatta/system-prompt-leak-scan.
hallucination-risk-meter
Estimate hallucination risk in LLM answers from uncertainty language, unsupported specifics, citations, and context coverage. Python port of @mukundakatta/hallucination-risk-meter.
citation-integrity-check
Verify answer citations refer to supplied source ids and that cited sources actually support the claims. Python port of @mukundakatta/citation-integrity-check.
llm-output-sanitizer-py
Sanitize LLM outputs before HTML, SQL, shell, or markdown sinks. Python port of @mukundakatta/llm-output-sanitizer.
prompt-injection-shield-py
Scan retrieved text for prompt-injection risk before adding it to model context. Python port of @mukundakatta/prompt-injection-shield.
pii-sentry-py
Detect and redact PII and secret-like values before logging or sending text to AI providers. Python port of @mukundakatta/pii-sentry.
partial-json-stream
Streaming JSON parser that yields partial valid trees as tokens arrive. For LLM tool calls, structured outputs, and partial recovery.
agentguard-firewall
Network egress firewall for AI agents. Declarative allow/deny list of hosts your agent tools may reach. Python port of @mukundakatta/agentguard.
agentcast-py
Structured output for any LLM call. Validate-and-retry loop for JSON responses; BYO LLM and validator. Python port of @mukundakatta/agentcast.
agentvet-py
Validate LLM-generated tool args before execution. Wraps tool functions with arg validation, raises ToolArgError with LLM-friendly retry hint. Python port of @mukundakatta/agentvet.
agentsnap-py
Snapshot tests for AI agents. Record an agent's tool-call trace, diff against a baseline, fail CI on regressions. Python port of @mukundakatta/agentsnap.
agentfit-py
Fit your messages into the LLM context window. Token-aware truncation with multiple strategies, pluggable tokenizers. Python port of @mukundakatta/agentfit.
agent-run-diff
Compare baseline vs current agent runs and surface regressions as structured reasons: success loss, new errors, failed tool calls, output drift, step/latency/cost bloat.
llm-usage-report
Parse LLM API response logs (Anthropic, OpenAI, Google) and generate token / cost reports. Supports a --alert-at budget alarm that exits non-zero when total cost exceeds a threshold. No framework adoption required.
ai-eval-forge
Zero-dependency eval harness for LLM and agent regression testing. Scores outputs with exact, contains, regex, JSON, citation, and token-F1 checks. Compares two runs to flag regressions.
codex-skill-kit
Scaffold and validate Codex skills from the command line.
claude-hooks-check
Linter for Claude Code hooks configuration (the 'hooks' block of settings.json). Validates event names, matcher shape, command entries, and flags dangerous commands or hardcoded secrets.
claude-commands-check
Linter for Claude Code slash-command files (.claude/commands/*.md). Validates YAML frontmatter, allowed-tools shape, description quality, and flags hardcoded secrets.
mcp-config-check
Linter for MCP (Model Context Protocol) config files used by Claude Desktop, Cursor, Cline, Windsurf, and Zed. CLI + library API.
claude-skill-check
Linter for Claude Code SKILL.md files. Validates YAML frontmatter, required fields, description length, and common secret patterns.