Skip to main content

Python implementation of GCF (Graph Compact Format): token-optimized wire format for LLM tool responses

Project description

Blackwell Systems License

gcf-python

Python implementation of GCF (Graph Compact Format).

84% fewer tokens than JSON. 32% fewer than TOON. 100% LLM comprehension accuracy at 500 symbols, where JSON fails.

Install

pip install gcf-py

Zero dependencies. Pure Python. Python 3.9+. Includes CLI.

CLI

gcf encode < payload.json    # JSON to GCF
gcf decode < payload.gcf     # GCF to JSON
gcf stats  < payload.json    # token comparison with visual bar
Payload: 50 symbols, 20 edges

  JSON  ██████████████████████████████  4,200 tokens
  GCF   ████████░░░░░░░░░░░░░░░░░░░░░░  1,150 tokens

  Savings: 73% fewer tokens with GCF

Library

Quick Start

from gcf import encode, Payload, Symbol, Edge

p = Payload(
    tool="context_for_task",
    token_budget=5000,
    tokens_used=1847,
    symbols=[
        Symbol(qualified_name="pkg.AuthMiddleware", kind="function", score=0.78, provenance="lsp_resolved", distance=0),
        Symbol(qualified_name="pkg.NewServer", kind="function", score=0.54, provenance="lsp_resolved", distance=1),
    ],
    edges=[
        Edge(source="pkg.NewServer", target="pkg.AuthMiddleware", edge_type="calls"),
    ],
)

output = encode(p)

Output:

GCF tool=context_for_task budget=5000 tokens=1847 symbols=2
## targets
@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
## related
@1 fn pkg.NewServer 0.54 lsp_resolved
## edges
@0<@1 calls

Decode

from gcf import decode

p = decode(input_text)
print(p.tool, len(p.symbols), "symbols", len(p.edges), "edges")

Session Deduplication

Track transmitted symbols across multiple tool responses. Previously-sent symbols become bare references instead of full declarations:

from gcf import encode_with_session, Session, Payload, Symbol

sess = Session()

out1 = encode_with_session(payload1, sess)  # full declarations
out2 = encode_with_session(payload2, sess)  # reused symbols as "@N  # previously transmitted"

By the 5th call in a session: 92.7% token savings vs JSON.

Delta Encoding

When the consumer already has a prior context pack, send only what changed:

from gcf import encode_delta, DeltaPayload, Symbol, Edge

delta = DeltaPayload(
    tool="context_for_task",
    base_root="aaa111",
    new_root="bbb222",
    removed=[Symbol(qualified_name="pkg.OldFunc", kind="function")],
    added=[Symbol(qualified_name="pkg.NewFunc", kind="function", score=0.85, provenance="rwr")],
    delta_tokens=30,
    full_tokens=200,
)

output = encode_delta(delta)

81.2% savings on re-queries where the pack changed slightly.

API

Function Description
encode(p: Payload) -> str Encode a payload to GCF text
decode(input_text: str) -> Payload Parse GCF text back to a Payload
encode_with_session(p: Payload, s: Session) -> str Encode with session deduplication
encode_delta(d: DeltaPayload) -> str Encode a delta (added/removed only)
Session() Create a new session tracker (thread-safe)

Types

Type Purpose
Payload Full GCF payload: tool, budget, symbols, edges, pack root
Symbol Graph node: qualified name, kind, score, provenance, distance
Edge Directed relationship: source, target, edge type
DeltaPayload Diff between two packs: added/removed symbols and edges
Session Thread-safe tracker for multi-call deduplication
KIND_ABBREV / KIND_EXPAND Bidirectional kind abbreviation dicts

Comprehension Eval

Rigorous 3-way benchmark (GCF vs TOON vs JSON) at 500 symbols, 200 edges. Six structured extraction questions sent to an LLM:

Format Accuracy Tokens vs JSON
GCF 100% (6/6) 11,090 79% fewer
TOON 100% (6/6) 16,378 69% fewer
JSON 66.7% (4/6) 53,341 baseline

JSON failed on counting tasks. GCF and TOON both achieved perfect accuracy. GCF does it in 32% fewer tokens.

Token Efficiency (TOON's Own Benchmark)

Running TOON's benchmark harness with GCF inserted (their datasets, their tokenizer):

Track GCF TOON Result
Mixed-structure (nested, semi-uniform) 169,554 227,896 GCF 34% smaller
Flat-only (tabular) 66,026 67,837 GCF 3% smaller
Semi-uniform event logs 107,269 154,032 GCF 44% smaller

GCF wins on every dataset except deeply nested config (75 tokens on a 618-token payload). On semi-uniform data, GCF uses 44% fewer tokens than TOON.

Reproducible: blackwell-systems/toon@gcf-comparison

Other Implementations

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcf_python-0.1.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gcf_python-0.1.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file gcf_python-0.1.0.tar.gz.

File metadata

  • Download URL: gcf_python-0.1.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for gcf_python-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ef73584567198c0a2d3b13e5b10bfe729ccbfd306853dfc0469bae971f976959
MD5 0777729166a1fc8798aa70c856839d7c
BLAKE2b-256 eebc9042bb08444727b8336ca05ae25654d48d76f50c4998a8480aa040940a4c

See more details on using hashes here.

File details

Details for the file gcf_python-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gcf_python-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for gcf_python-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff7d887e6ad18e6d2a36dde0d90b36999d04210af0865cfdc614ce8f8d948b8c
MD5 179b0de3d41841bba06fab2fa13aaf78
BLAKE2b-256 669827ba7589df2c7685ce3d590e80f2a2e89d83e10e3113cebcd71aa44081f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page