Planet-aware observability for LLM inference

These details have not been verified by PyPI

Project links

Project description

Vetch SDK

Planet-aware observability for LLM inference.

Vetch is a Python SDK that wraps LLM API calls to log energy consumption, cost, and carbon per inference using live grid data. It never reads prompt or completion content—only metadata from the response usage.

Why Vetch?

Attributed Spend, Not Just Total Spend

Provider dashboards (OpenAI Usage, Anthropic Console, Google Cloud Billing) show you total spend. Vetch shows you attributed spend. Using tags, you can track cost-per-feature, cost-per-user, or cost-per-environment in real-time—without building custom infrastructure.

Sustainability Instrumentation

Begin tracking AI inference emissions for future CSRD (EU) and SEC (US) Scope 3 reporting. Note: Current estimates are Tier 3 (order-of-magnitude). Vetch provides the instrumentation infrastructure—audit-grade accuracy requires Tier 1/2 energy data from providers or calibrated measurements.

Design Guarantees

Fail-Open Architecture

Vetch is architected with a non-blocking, fail-open boundary. Every Vetch operation (patching, calculation, emission) is wrapped in isolated error handlers. If Vetch fails, your LLM call proceeds normally, and a tracking_disabled: true event is logged. Vetch will never cause an inference outage.

Privacy & Data Perimeter

Vetch never reads or stores prompt/completion content. It only extracts metadata (token counts, model names, timing) directly from SDK response objects. No PII or proprietary prompt data ever leaves your execution environment.

Thread Safety (v0.1.4+)

Vetch is fully thread-safe and supports multi-client isolation. It uses contextvars for async safety and WeakKeyDictionary for client patching, ensuring that unpatching one client doesn't affect another in the same process.

Features

Fail-Open: LLM calls always proceed even if Vetch fails
Privacy-First: No prompt or completion data is ever read or buffered
Multi-tier Caching: Memory → File → API → Regional averages for grid data
Observability-Transparent: Works seamlessly with Datadog, OpenTelemetry, and Sentry
Low Overhead: Under 5ms overhead for sync calls; zero TTFT latency for streaming
MoE-Aware: Energy estimates account for active parameters in Mixture-of-Experts models

Installation

pip install vetch

Quick Start

from vetch import wrap
from openai import OpenAI

client = OpenAI()

with wrap(region="us-east-1", tags={"team": "ml", "env": "prod"}) as ctx:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello world"}]
    )

# Access inference metadata
print(f"Energy: {ctx.event['estimated_energy_wh']} Wh")
print(f"Carbon: {ctx.event['estimated_carbon_g']} gCO2e")

CLI Usage

Estimate energy/carbon for a model without running code:

vetch estimate --model gpt-4o --input-tokens 1000 --output-tokens 500 --region us-east-1

Compare multiple models:

vetch compare --models gpt-4o,claude-3-opus,gemini-1.5-pro --tokens 1000

Analyze your token usage patterns:

vetch audit

Check your environment:

vetch check

Token Waste Audit

Vetch tracks token usage patterns across your session and provides actionable recommendations:

from vetch import wrap, get_session_stats, generate_advisories

# Make multiple LLM calls
for _ in range(10):
    with wrap() as ctx:
        response = client.chat.completions.create(...)

# Analyze patterns
stats = get_session_stats()
advisories = generate_advisories(stats)

for a in advisories:
    print(f"[{a.level.value}] {a.title}")
    print(f"  {a.description}")

What it detects:

Static system prompts: Repeated input token counts suggest cacheable prompts
High input:output ratios: Large inputs producing small outputs
Expensive model usage: Opportunities to use smaller, cheaper models

GPU Calibration (Local Inference)

For local inference (Ollama, vLLM, llama.cpp), calibrate energy measurements using actual GPU power draw:

from vetch.calibrate import calibrate_model, format_calibration_result

def my_inference():
    # Run your inference workload
    # Return (input_tokens, output_tokens)
    response = ollama.generate(model="llama3.1:8b", prompt="Hello world")
    return 100, 50  # Your actual token counts

result = calibrate_model("ollama", "llama3.1:8b", workload=my_inference)
print(format_calibration_result(result))

# Use calibrated values for accurate tracking
with wrap(energy_override=result.to_override()) as ctx:
    response = ollama.generate(...)

Check calibration status:

vetch calibrate --status

Requirements: NVIDIA GPU with pynvml (pip install nvidia-ml-py3)

Historical Analysis & Reporting

Vetch can persist events to SQLite for historical FinOps analysis:

from vetch import configure_storage, query_usage, wrap
from datetime import datetime, timedelta

# Enable persistent storage
configure_storage()  # Uses ~/.vetch/usage.db

# Your LLM calls are now tracked
with wrap(tags={"team": "ml", "feature": "chat"}) as ctx:
    response = client.chat.completions.create(...)

# Query historical usage
summary = query_usage(
    start=datetime.now() - timedelta(days=7),
    tags={"team": "ml"}
)

print(f"Total cost: ${summary.total_cost_usd:.2f}")
print(f"Total energy: {summary.total_energy_wh:.2f} Wh")
print(f"Requests: {summary.total_requests}")

Generate reports from CLI:

# Weekly report
vetch report --days 7

# Filter by team
vetch report --tags team=ml

# Show top consumers
vetch report --top --top-by team --days 30

# JSON output for dashboards
vetch report --format json

Energy Tiers

Vetch uses a tiered system for energy estimate confidence:

Tier	Name	Uncertainty	Source
0	Measured	±10-20%	Direct GPU measurement (pynvml)
1	Vendor-Published	±20-50%	Official provider data
2	Validated	±50-100%	Crowdsourced aggregates
3	Estimated	order of magnitude	Parameter-based calculation

Run vetch methodology to see full methodology documentation.

Environment Variables

Variable	Description
`VETCH_DISABLED`	Set to `true` to completely disable Vetch (emergency kill switch)
`VETCH_REGION`	Default grid region (e.g., `us-east-1`, `eu-west-1`)
`VETCH_OUTPUT`	Output target: `none` (default), `stderr`, or file path
`ELECTRICITY_MAPS_API_KEY`	API key for live grid carbon intensity data
`VETCH_CACHE_MODE`	Set to `memory-only` for serverless/Lambda environments

Alpha Limitations

This is an alpha release. Please be aware of:

Energy estimates are uncertain: Most models use Tier 3 estimates (±10x uncertainty). See vetch methodology for details.
Region inference is approximate: Without explicit VETCH_REGION, timezone-based inference is ~30% accurate. Set the region explicitly for accurate carbon calculations.
Experimental modules: vetch.calibrate, vetch.storage, and vetch.ci emit FutureWarning and may change in future versions.
Provider support: Currently supports OpenAI, Anthropic, and Vertex AI. Other providers coming soon.

Troubleshooting

Vetch is blocking my LLM calls:

export VETCH_DISABLED=true  # Emergency kill switch

Too much output:

export VETCH_OUTPUT=none  # Silence all output

Need to debug:

import logging
logging.getLogger("vetch").setLevel(logging.DEBUG)

License

Apache License 2.0. See LICENSE and NOTICE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.0

May 12, 2026

0.4.0

Apr 28, 2026

0.3.1

Apr 17, 2026

0.3.0

Apr 17, 2026

0.2.4

Mar 22, 2026

0.2.3

Mar 19, 2026

0.2.2

Mar 16, 2026

0.2.1

Mar 9, 2026

0.2.0

Mar 8, 2026

0.1.8

Mar 4, 2026

0.1.7

Mar 3, 2026

0.1.6

Feb 25, 2026

This version

0.1.5

Feb 23, 2026

0.1.4

Feb 23, 2026

0.1.3

Feb 19, 2026

0.1.2

Feb 18, 2026

0.1.1

Feb 18, 2026

0.1.0

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vetch-0.1.5.tar.gz (124.0 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vetch-0.1.5-py3-none-any.whl (91.3 kB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file vetch-0.1.5.tar.gz.

File metadata

Download URL: vetch-0.1.5.tar.gz
Upload date: Feb 23, 2026
Size: 124.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vetch-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`a6365e55019879eec416c9486f69ff28e20341cc1077d4ed144b3083ef2842be`
MD5	`294a578fe863d71efcdc6c2cf77bbb08`
BLAKE2b-256	`06abe2bc150d408250113511e169fb6c055e12a85d568dc29d3061a6b7b138b6`

See more details on using hashes here.

File details

Details for the file vetch-0.1.5-py3-none-any.whl.

File metadata

Download URL: vetch-0.1.5-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 91.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vetch-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d648cab8f83eb4525520b8f910c00d3c3e12b4e55a14d6c169926ae1f7349a11`
MD5	`260007195d3a590c09dacf559d443cd4`
BLAKE2b-256	`fb7ef18ba34bd809a7e9ebf3ae23e06a44923ee5b87cf6e76f1ff9268024fe89`

See more details on using hashes here.

vetch 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vetch SDK

Why Vetch?

Design Guarantees

Fail-Open Architecture

Privacy & Data Perimeter

Thread Safety (v0.1.4+)

Features

Installation

Quick Start

CLI Usage

Token Waste Audit

GPU Calibration (Local Inference)

Historical Analysis & Reporting

Energy Tiers

Environment Variables

Alpha Limitations

Troubleshooting

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes