Skip to main content

Python implementation of GCF (Graph Compact Format): token-optimized wire format for LLM tool responses

Project description

Blackwell Systems License

gcf-python

Python implementation of GCF (Graph Compact Format).

84% fewer tokens than JSON. 32% fewer than TOON. 100% LLM comprehension accuracy at 500 symbols, where JSON fails.

Install

pip install gcf-python

Zero dependencies. Pure Python. Python 3.9+. Includes CLI.

CLI

gcf encode < payload.json    # JSON to GCF
gcf decode < payload.gcf     # GCF to JSON
gcf stats  < payload.json    # token comparison with visual bar
Payload: 50 symbols, 20 edges

  JSON  ██████████████████████████████  4,200 tokens
  GCF   ████████░░░░░░░░░░░░░░░░░░░░░░  1,150 tokens

  Savings: 73% fewer tokens with GCF

Library

Quick Start

from gcf import encode, Payload, Symbol, Edge

p = Payload(
    tool="context_for_task",
    token_budget=5000,
    tokens_used=1847,
    symbols=[
        Symbol(qualified_name="pkg.AuthMiddleware", kind="function", score=0.78, provenance="lsp_resolved", distance=0),
        Symbol(qualified_name="pkg.NewServer", kind="function", score=0.54, provenance="lsp_resolved", distance=1),
    ],
    edges=[
        Edge(source="pkg.NewServer", target="pkg.AuthMiddleware", edge_type="calls"),
    ],
)

output = encode(p)

Output:

GCF tool=context_for_task budget=5000 tokens=1847 symbols=2
## targets
@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
## related
@1 fn pkg.NewServer 0.54 lsp_resolved
## edges
@0<@1 calls

Decode

from gcf import decode

p = decode(input_text)
print(p.tool, len(p.symbols), "symbols", len(p.edges), "edges")

Session Deduplication

Track transmitted symbols across multiple tool responses. Previously-sent symbols become bare references instead of full declarations:

from gcf import encode_with_session, Session, Payload, Symbol

sess = Session()

out1 = encode_with_session(payload1, sess)  # full declarations
out2 = encode_with_session(payload2, sess)  # reused symbols as "@N  # previously transmitted"

By the 5th call in a session: 92.7% token savings vs JSON.

Delta Encoding

When the consumer already has a prior context pack, send only what changed:

from gcf import encode_delta, DeltaPayload, Symbol, Edge

delta = DeltaPayload(
    tool="context_for_task",
    base_root="aaa111",
    new_root="bbb222",
    removed=[Symbol(qualified_name="pkg.OldFunc", kind="function")],
    added=[Symbol(qualified_name="pkg.NewFunc", kind="function", score=0.85, provenance="rwr")],
    delta_tokens=30,
    full_tokens=200,
)

output = encode_delta(delta)

81.2% savings on re-queries where the pack changed slightly.

Generic Encoding

Encode any Python value (not just graph payloads) into GCF tabular format:

from gcf import encode_generic

output = encode_generic({
    "employees": [
        {"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
        {"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
    ],
})

Output:

## employees [2]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000

Works on dicts, lists, and primitives. Lists of uniform dicts get tabular rows. Nested dicts use ## key section headers.

API

Function Description
encode(p: Payload) -> str Encode a graph payload to GCF text
encode_generic(data: Any) -> str Encode any value to GCF tabular format
decode(input_text: str) -> Payload Parse GCF text back to a Payload
encode_with_session(p: Payload, s: Session) -> str Encode with session deduplication
encode_delta(d: DeltaPayload) -> str Encode a delta (added/removed only)
Session() Create a new session tracker (thread-safe)

Types

Type Purpose
Payload Full GCF payload: tool, budget, symbols, edges, pack root
Symbol Graph node: qualified name, kind, score, provenance, distance
Edge Directed relationship: source, target, edge type
DeltaPayload Diff between two packs: added/removed symbols and edges
Session Thread-safe tracker for multi-call deduplication
KIND_ABBREV / KIND_EXPAND Bidirectional kind abbreviation dicts

Comprehension Eval

Rigorous 3-way benchmark (GCF vs TOON vs JSON) at 500 symbols, 200 edges. Six structured extraction questions sent to an LLM:

Format Accuracy Tokens vs JSON
GCF 100% (6/6) 11,090 79% fewer
TOON 100% (6/6) 16,378 69% fewer
JSON 66.7% (4/6) 53,341 baseline

JSON failed on counting tasks. GCF and TOON both achieved perfect accuracy. GCF does it in 32% fewer tokens.

Token Efficiency (TOON's Own Benchmark)

Running TOON's benchmark harness with GCF inserted (their datasets, their tokenizer):

Track GCF TOON Result
Mixed-structure (nested, semi-uniform) 169,554 227,896 GCF 34% smaller
Flat-only (tabular) 66,026 67,837 GCF 3% smaller
Semi-uniform event logs 107,269 154,032 GCF 44% smaller

GCF wins on every dataset except deeply nested config (75 tokens on a 618-token payload). On semi-uniform data, GCF uses 44% fewer tokens than TOON.

Reproducible: blackwell-systems/toon@gcf-comparison

Other Implementations

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcf_python-0.1.1.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gcf_python-0.1.1-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file gcf_python-0.1.1.tar.gz.

File metadata

  • Download URL: gcf_python-0.1.1.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for gcf_python-0.1.1.tar.gz
Algorithm Hash digest
SHA256 20cce6321a831eab3694de2ea53844dc01e7e9a005f813decd2827bf4aec24dc
MD5 5dbaa60ffef92faf738c79d7fe5ba8a6
BLAKE2b-256 84d1166167bcf0b99531986128894d27db63a29bc30f1f5f2ba2f9ef979bb357

See more details on using hashes here.

File details

Details for the file gcf_python-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gcf_python-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for gcf_python-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a120870d2eca82673fa239eebdc59cee1fb70464e678213eb5c62ccfb956fe2
MD5 b4998840d436acde454b6ff7837255b5
BLAKE2b-256 0bfbd794761b688dd7ddaf2ce5baa2beed9a4bbd22b88cc8ff201f7b88631b11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page