Skip to main content

Structured payload compaction and manifold-aware analytics for LLM tool outputs

Project description

BinomialHash

Content-addressed, schema-aware structured data compaction for LLM tool outputs.

BinomialHash intercepts large JSON payloads from tool calls, infers schema and statistics, deduplicates by content fingerprint, and returns compact summaries that fit in LLM context windows. Agent tools let the model retrieve, aggregate, query, group, and export data on demand without blowing the token budget.

Install

pip install binomialhash

With exact token counting (OpenAI / xAI):

pip install binomialhash[openai]

All optional dependencies:

pip install binomialhash[all]

Quickstart

import json
from binomialhash import BinomialHash

bh = BinomialHash()

data = [
    {"ticker": "AAPL", "price": 189.50, "volume": 54_000_000, "sector": "Technology"},
    {"ticker": "MSFT", "price": 378.20, "volume": 28_000_000, "sector": "Technology"},
    {"ticker": "JPM",  "price": 195.30, "volume": 12_000_000, "sector": "Financials"},
    # ... hundreds more rows ...
]

raw = json.dumps(data)
summary = bh.ingest(raw, "market_data")
# If len(raw) > 3000 chars: returns a compact schema + stats summary
# If small: passes through unchanged

# Query stored data
rows = bh.retrieve("market_data_abc123", offset=0, limit=10)
agg  = bh.aggregate("market_data_abc123", "price", "mean")

Provider Adapters

BinomialHash ships with 68 provider-neutral tool definitions that expose its full API (retrieve, aggregate, query, group, statistical analysis, causal inference, manifold navigation, spatial reasoning, export, etc.) to any LLM. Adapters translate these into provider-specific formats.

OpenAI

from binomialhash import BinomialHash
from binomialhash.tools import get_all_tools
from binomialhash.adapters.openai import get_openai_tools, handle_openai_tool_call

bh = BinomialHash()
specs = get_all_tools(bh)
tools = get_openai_tools(specs)  # Responses API format (default)

# Pass to the API
response = client.responses.create(model="gpt-4o", tools=tools, input=messages)

# Handle function calls
for item in response.output:
    if item.type == "function_call":
        result = handle_openai_tool_call(specs, item.name, item.arguments)

For Chat Completions (legacy):

tools = get_openai_tools(specs, format="chat_completions")

Anthropic

from binomialhash.adapters.anthropic import get_anthropic_tools, handle_anthropic_tool_use

tools = get_anthropic_tools(specs)
# Pass to client.messages.create(tools=tools, ...)

# Handle tool_use blocks
result = handle_anthropic_tool_use(specs, block.name, block.input)

Google Gemini

from google.genai import types
from binomialhash.adapters.gemini import get_gemini_tools, handle_gemini_tool_call

decls = get_gemini_tools(specs)
gemini_tools = types.Tool(function_declarations=decls)

# Handle function_call parts
result = handle_gemini_tool_call(specs, fc.name, fc.args)

xAI / Grok

from binomialhash.adapters.xai import get_xai_tools, handle_xai_tool_call

tools = get_xai_tools(specs)
# Uses OpenAI-compatible format

Provider Router

from binomialhash.adapters import get_tools_for_provider

tools = get_tools_for_provider(specs, provider="openai")
tools = get_tools_for_provider(specs, provider="anthropic")

Middleware

Auto-intercept large tool outputs without modifying tool functions:

from binomialhash.middleware import bh_intercept, raw_mode

@bh_intercept(label="market_data")
def fetch_data(ticker: str) -> dict:
    return huge_json_response  # auto-compacted if > 3000 chars

# Bypass interception when you need the raw payload
with raw_mode():
    native = fetch_data("AAPL")  # returns original dict

Wrapper form for third-party functions:

from binomialhash.middleware import wrap_tool_with_bh

wrapped = wrap_tool_with_bh(third_party_fetch, label="external_data")

Both sync and async functions are supported.

Token Counting

from binomialhash.tokenizers import count_tokens, is_exact

n = count_tokens("Hello world", provider="openai")   # exact with tiktoken
n = count_tokens("Hello world", provider="anthropic") # heuristic (chars/4)

if is_exact("openai"):
    print("Using tiktoken")

Built-in context stats on every BinomialHash instance:

stats = bh.context_stats()
# {"tool_calls": 5, "chars_in_raw": 120000, "chars_out_to_llm": 8000,
#  "compression_ratio": 15.0, "est_tokens_out": 2000, ...}

Package Structure

binomialhash/
  core.py              # BinomialHash class — ingest, retrieve, aggregate, query
  schema.py            # Schema inference and column typing
  extract.py           # Row extraction from nested JSON
  predicates.py        # Predicate building and row filtering
  context.py           # Request-scoped contextvar helpers
  insights.py          # Objective-driven insight extraction
  middleware.py         # Auto-interception decorator and raw-mode bypass
  stats/               # 39 statistical tools across 7 stages
    regression.py      #   OLS, partial correlation, PCA
    quality.py         #   Outlier detection, missing data, distribution tests
    dependency.py      #   Correlation matrices, mutual information, Granger
    drivers.py         #   Feature importance, SHAP-style, interaction screening
    structure.py       #   Clustering, segmentation, latent structure
    causal.py          #   ATE estimation, synthetic control, counterfactuals
    dynamics.py        #   Trend decomposition, change-point, recurrence
    laws.py            #   Benford, Zipf, power-law diagnostics
  manifold/            # Manifold surface construction and navigation (14 tools)
    spatial.py         #   6 spatial reasoning tools (HKS, diffusion, Reeb, etc.)
  tools/               # 68 provider-neutral ToolSpec definitions
  adapters/            # OpenAI, Anthropic, Gemini, xAI schema translators
  exporters/           # Markdown, CSV, Excel, chunked artifacts
  tokenizers/          # Provider-aware token counting

Scope and Limitations

BinomialHash is a structured data compaction and analytics engine. Its manifold topology outputs are operational structural diagnostics — not proofs of true underlying manifold topology. The edge-incidence manifoldness gate is implemented; vertex-link validation and combinatorial orientability are on the roadmap.

Development

cd binomialhash
pip install -e ".[dev]"
python -m pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binomialhash-0.1.2.tar.gz (113.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

binomialhash-0.1.2-py3-none-any.whl (138.5 kB view details)

Uploaded Python 3

File details

Details for the file binomialhash-0.1.2.tar.gz.

File metadata

  • Download URL: binomialhash-0.1.2.tar.gz
  • Upload date:
  • Size: 113.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binomialhash-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4e2b55b607aab3e87d69f0ecc4d15cf18a198c37f432149584162042aca30e8c
MD5 ab6c6cab28e698128dde4371803cfd5c
BLAKE2b-256 2dfd189c102b60918ac749d20bd23ce8aad1fcdb6256ff782e68ef69465d84ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for binomialhash-0.1.2.tar.gz:

Publisher: publish.yml on Binomial-Capital-Management/binomialhash

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binomialhash-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: binomialhash-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 138.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binomialhash-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 222421ea0a806e9c776534952ed7b2a4fd085b7f3ff61bf339ea2357bda7b026
MD5 76925de4195f2d2b160e3407f66f5e35
BLAKE2b-256 d9952be27120d2c8c7b734ae33f327651f597bc5f920d4e17d33e77776927819

See more details on using hashes here.

Provenance

The following attestation bundles were made for binomialhash-0.1.2-py3-none-any.whl:

Publisher: publish.yml on Binomial-Capital-Management/binomialhash

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page