Offline AI developer tooling: token counting, cost estimation, context planning, chunking, validation, PII redaction, and diffing.
Project description
Aidex
Offline AI developer tooling that runs with zero network calls and drops straight into an agent. Token counting, cost estimation, context-window planning, text chunking, data validation, PII redaction, and diffing — one package, exposed three ways: a Python library, a CLI, and an agent tool registry with JSON Schemas.
No API keys. No HTTP clients in core. Works in air-gapped CI, locked-down enterprise environments, and pre-commit hooks where calling out to a pricing or tokenization API isn't an option.
Aidex deliberately makes a different trade than the cost-calculator crowd: it
never calls the network, so its model pricing is a bundled snapshot rather
than live data. When that snapshot goes stale, you override it locally without
waiting for a release — see Custom & updated pricing.
Non-OpenAI token counts are character heuristics labeled estimate, never
presented as exact (see Confidence labeling).
Naming note: this package is published on PyPI as
aidex-tools(the CLI command isaidex-tools, the import path isaidex).
Note on tiktoken: OpenAI token counting uses tiktoken, which downloads its encoding files once on first use and caches them locally. After that first run, everything is fully offline.
Install
pip install aidex-tools
# or
uv add aidex-tools
Requires Python >= 3.11.
Quickstart (library)
The import path is aidex:
from aidex.tokens import count_tokens
from aidex.cost import estimate_cost
from aidex.context import plan_context
from aidex.chunk import chunk_text
from aidex.redact import redact_pii
from aidex.diff import diff_text
from aidex.validate.json import validate_json
from aidex.validate.jsonl import validate_jsonl
from aidex.validate.csv_module import validate_csv
# Count tokens for one model...
result = count_tokens("hello world", model="gpt-4o")
print(result.token_count, result.confidence) # 2 exact
# ...or compare across the default 6-model set
for r in count_tokens("hello world"):
print(r.model, r.token_count, r.confidence)
# Estimate cost
cost = estimate_cost("some prompt", model="claude-sonnet-4-6", output_tokens=500)
print(f"${cost.total_cost_usd:.6f} ({cost.confidence})")
# Will it fit?
plan = plan_context(open("examples/big_doc.txt").read(), model="gpt-4o")
if not plan.fits and plan.suggestion:
print(f"Chunk into ~{plan.suggestion.estimated_chunks} pieces")
# Chunk it
chunks = chunk_text(
open("examples/big_doc.txt").read(), max_tokens=512, overlap_tokens=50
)
# Redact PII (one-way; audit trail never contains original values)
redacted = redact_pii("Contact bob@example.com or 555-867-5309")
print(redacted.redacted_text) # Contact [EMAIL] or [PHONE]
Every public function returns a Pydantic model that serializes cleanly to
JSON — the same shapes the CLI emits with --json and the agent registry
returns from call_tool.
Confidence labeling (honesty is mandatory)
Token counts are only exact when the tokenizer is public. Aidex never presents an estimate as exact:
| Provider | Counting method | Confidence |
|---|---|---|
| OpenAI (gpt-5.5, gpt-5.4, gpt-4o, …) | tiktoken | exact |
| Anthropic, Google, others | per-provider character heuristic | estimate |
Every result — token counts, costs, context plans, chunks, token deltas —
carries a confidence field, and CLI comparison tables show a Confidence
column.
Heuristic accuracy
For models without a public tokenizer, Aidex estimates characters ÷ chars_per_token. The divisor is per provider, not a flat 4, because
tokenizers differ: Claude's runs denser than GPT's (more tokens, fewer
characters each), so a flat ÷4 understates Claude usage. Defaults are
rough English-prose averages (Claude ≈ 3.5, Gemini ≈ 4.0); they are still
estimates and remain labeled estimate.
You can tune the divisor per model with a chars_per_token field in your
external models file — useful if you measure
your own corpus (code and non-English text tokenize differently from prose).
Custom & updated pricing
The bundled model catalog is a point-in-time snapshot. Because Aidex never calls the network, new model launches and price changes don't reach it until the next release — so you can override the catalog locally instead of waiting.
Point AIDEX_MODELS_FILE (or the --models-file CLI flag) at a JSON file using
the same shape as the bundle. Entries are merged over the bundled catalog:
- a model whose
idmatches a bundled one replaces that entry (full entry), - new ids are added (handy for private models or fine-tunes),
- an optional
default_comparison_setoverrides the default comparison list.
{
"default_comparison_set": ["gpt-5.5", "my-finetune"],
"models": [
{
"id": "gpt-5.5",
"aliases": [],
"context_window": 1050000,
"input_price_per_1m": 4.0,
"output_price_per_1m": 24.0,
"counting_method": "tiktoken",
"confidence": "exact"
},
{
"id": "my-finetune",
"aliases": ["ft"],
"context_window": 32000,
"input_price_per_1m": 0.5,
"output_price_per_1m": 1.5,
"counting_method": "heuristic",
"confidence": "estimate",
"chars_per_token": 3.7
}
]
}
The optional chars_per_token tunes the heuristic divisor for that model
(see Heuristic accuracy); omit it to use the
per-provider default inferred from the model id.
# via flag (applies to every subcommand)
aidex-tools --models-file ./my_models.json cost estimate prompt.txt --model gpt-5.5
# or via environment variable
export AIDEX_MODELS_FILE=./my_models.json
aidex-tools models list
From the library, set the env var before the catalog is first read. The catalog
is process-cached; if you change it mid-process, call
aidex.models.load_catalog.cache_clear().
CLI
Every subcommand supports --json for machine-readable output. Exit codes:
0 success, 1 runtime/validation error, 2 usage error. Errors go to
stderr (as {"error": "...", "code": "..."} in --json mode).
aidex-tools tokens count "How many tokens is this?" # compare 6 models
aidex-tools tokens count examples/prompt.txt --model gpt-4o --json
aidex-tools cost estimate examples/prompt.txt --model claude-sonnet-4-6 --output-tokens 1000
aidex-tools context plan examples/big_doc.txt --model gpt-4o --reserve-output 4096
aidex-tools chunk split examples/big_doc.txt --max-tokens 512 --overlap 50
aidex-tools validate json examples/config.json --schema examples/schema.json
aidex-tools validate jsonl examples/dataset.jsonl --check-keys
aidex-tools validate csv examples/data.csv --no-header
aidex-tools redact pii "email bob@example.com, key sk-abc123def456ghi789" \
--patterns email,api_key
aidex-tools diff examples/old_prompt.txt examples/new_prompt.txt --model gpt-4o
aidex-tools models list
aidex-tools models show claude-sonnet-4-6
aidex-tools tools list
Agent registry
All tools are exposed through a single registry with JSON Schema definitions, ready to wire into any agent framework:
from aidex.agent import list_tools, call_tool
tools = list_tools()
# [{"name": "count_tokens", "description": "...", "input_schema": {...}}, ...]
result = call_tool("count_tokens", {"text": "hello", "model": "gpt-4o"})
# {"model": "gpt-4o", "token_count": 1, "counting_method": "tiktoken",
# "confidence": "exact"}
Arguments are validated with Pydantic; results are JSON-serializable dicts.
An MCP server is planned as an optional extra (aidex-tools mcp serve is a
stub in v0.1).
Heads-up for agent integrations:
diff_textandvalidate_jsonaccept either literal text or a file path — a string argument naming an existing file is read from disk. If tool arguments come from an untrusted source, treat this as a local file read capability.
Tool reference
| Tool | Library function | CLI |
|---|---|---|
| Token calculator | aidex.tokens.count_tokens |
aidex-tools tokens count |
| Cost estimator | aidex.cost.estimate_cost |
aidex-tools cost estimate |
| Context planner | aidex.context.plan_context |
aidex-tools context plan |
| Text chunker | aidex.chunk.chunk_text |
aidex-tools chunk split |
| JSON validator | aidex.validate.json.validate_json |
aidex-tools validate json |
| JSONL validator | aidex.validate.jsonl.validate_jsonl |
aidex-tools validate jsonl |
| CSV validator | aidex.validate.csv_module.validate_csv |
aidex-tools validate csv |
| PII redactor | aidex.redact.redact_pii |
aidex-tools redact pii |
| Diff checker | aidex.diff.diff_text |
aidex-tools diff |
Notes:
- Chunker uses recursive separator-aware splitting (
"\n\n","\n",". "," "by default) and hard-splits at character boundaries as a last resort when no separator can satisfy the token budget. - JSON Schema validation implements a dependency-free subset:
type,properties,required,items,enum,additionalProperties,minimum/maximum,minLength/maxLength. - PII redaction is a best-effort, regex-only scrubber in v0.1 (email, phone, SSN, credit card, IPv4, API keys). It catches known token shapes, not names, addresses, or contextual PII — do not treat it as a compliance control. It is one-way: there is no unredact, and the audit trail records only type, span, and placeholder.
- Model pricing in
aidex-tools models listis a bundled snapshot for offline estimation; verify against provider pricing pages before billing decisions, and override it locally via Custom & updated pricing when it goes stale.
Development
uv sync --extra dev
uv run pytest --cov=aidex
uv run ruff check .
uv run mypy src/aidex
uv run black --check .
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aidex_tools-0.1.0.tar.gz.
File metadata
- Download URL: aidex_tools-0.1.0.tar.gz
- Upload date:
- Size: 130.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
174a5ece90626a5f755cba4493c421860138b04ecfa9f2f46d130ec2e4925990
|
|
| MD5 |
e7f2533ce4b412c80187a5ec7e3493dd
|
|
| BLAKE2b-256 |
36fbef275836cfb0b859ca9b97ec42c909a97c47177ffae40ef63915ffbe215a
|
File details
Details for the file aidex_tools-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aidex_tools-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
126c309006b6bef51677fc8f44f569b7f1df1eb5f6ac260b9b6492caddbd8eab
|
|
| MD5 |
59375b2633cfb68229319b170336cb9f
|
|
| BLAKE2b-256 |
eba2cc80cf35eb865e46657f23fc8f663b5b453ce7004221047c04c6ad07869f
|