Skip to main content

Python implementation of GCF (Graph Compact Format): token-optimized wire format for LLM tool responses

Project description

Blackwell Systems License

gcf-python

Python implementation of GCF (Graph Compact Format).

84% fewer tokens than JSON. 32% fewer than TOON. 100% LLM comprehension accuracy at 500 symbols, where JSON fails.

Install

pip install gcf-python

Zero dependencies. Pure Python. Python 3.9+. Includes CLI.

CLI

gcf encode < payload.json    # JSON to GCF
gcf decode < payload.gcf     # GCF to JSON
gcf stats  < payload.json    # token comparison with visual bar
Payload: 50 symbols, 20 edges

  JSON  ██████████████████████████████  4,200 tokens
  GCF   ████████░░░░░░░░░░░░░░░░░░░░░░  1,150 tokens

  Savings: 73% fewer tokens with GCF

Library

Quick Start

from gcf import encode, Payload, Symbol, Edge

p = Payload(
    tool="context_for_task",
    token_budget=5000,
    tokens_used=1847,
    symbols=[
        Symbol(qualified_name="pkg.AuthMiddleware", kind="function", score=0.78, provenance="lsp_resolved", distance=0),
        Symbol(qualified_name="pkg.NewServer", kind="function", score=0.54, provenance="lsp_resolved", distance=1),
    ],
    edges=[
        Edge(source="pkg.NewServer", target="pkg.AuthMiddleware", edge_type="calls"),
    ],
)

output = encode(p)

Output:

GCF tool=context_for_task budget=5000 tokens=1847 symbols=2
## targets
@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
## related
@1 fn pkg.NewServer 0.54 lsp_resolved
## edges
@0<@1 calls

Decode

from gcf import decode

p = decode(input_text)
print(p.tool, len(p.symbols), "symbols", len(p.edges), "edges")

Session Deduplication

Track transmitted symbols across multiple tool responses. Previously-sent symbols become bare references instead of full declarations:

from gcf import encode_with_session, Session, Payload, Symbol

sess = Session()

out1 = encode_with_session(payload1, sess)  # full declarations
out2 = encode_with_session(payload2, sess)  # reused symbols as "@N  # previously transmitted"

By the 5th call in a session: 92.7% token savings vs JSON.

Delta Encoding

When the consumer already has a prior context pack, send only what changed:

from gcf import encode_delta, DeltaPayload, Symbol, Edge

delta = DeltaPayload(
    tool="context_for_task",
    base_root="aaa111",
    new_root="bbb222",
    removed=[Symbol(qualified_name="pkg.OldFunc", kind="function")],
    added=[Symbol(qualified_name="pkg.NewFunc", kind="function", score=0.85, provenance="rwr")],
    delta_tokens=30,
    full_tokens=200,
)

output = encode_delta(delta)

81.2% savings on re-queries where the pack changed slightly.

Generic Encoding

Encode any Python value (not just graph payloads) into GCF tabular format:

from gcf import encode_generic

output = encode_generic({
    "employees": [
        {"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
        {"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
    ],
})

Output:

## employees [2]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000

Works on dicts, lists, and primitives. Lists of uniform dicts get tabular rows. Nested dicts use ## key section headers.

API

Function Description
encode(p: Payload) -> str Encode a graph payload to GCF text
encode_generic(data: Any) -> str Encode any value to GCF tabular format
decode(input_text: str) -> Payload Parse GCF text back to a Payload
encode_with_session(p: Payload, s: Session) -> str Encode with session deduplication
encode_delta(d: DeltaPayload) -> str Encode a delta (added/removed only)
Session() Create a new session tracker (thread-safe)

Types

Type Purpose
Payload Full GCF payload: tool, budget, symbols, edges, pack root
Symbol Graph node: qualified name, kind, score, provenance, distance
Edge Directed relationship: source, target, edge type
DeltaPayload Diff between two packs: added/removed symbols and edges
Session Thread-safe tracker for multi-call deduplication
KIND_ABBREV / KIND_EXPAND Bidirectional kind abbreviation dicts

Comprehension Eval

Rigorous 3-way benchmark (GCF vs TOON vs JSON) at 500 symbols, 200 edges. Six structured extraction questions sent to an LLM:

Format Accuracy Tokens vs JSON
GCF 100% (6/6) 11,090 79% fewer
TOON 100% (6/6) 16,378 69% fewer
JSON 66.7% (4/6) 53,341 baseline

JSON failed on counting tasks. GCF and TOON both achieved perfect accuracy. GCF does it in 32% fewer tokens.

Token Efficiency (TOON's Own Benchmark)

Running TOON's benchmark harness with GCF inserted (their datasets, their tokenizer):

Track GCF TOON Result
Mixed-structure (nested, semi-uniform) 169,554 227,896 GCF 34% smaller
Flat-only (tabular) 66,026 67,837 GCF 3% smaller
Semi-uniform event logs 107,269 154,032 GCF 44% smaller

GCF wins on every dataset except deeply nested config (75 tokens on a 618-token payload). On semi-uniform data, GCF uses 44% fewer tokens than TOON.

Reproducible: blackwell-systems/toon@gcf-comparison

Other Implementations

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcf_python-0.1.2.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gcf_python-0.1.2-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file gcf_python-0.1.2.tar.gz.

File metadata

  • Download URL: gcf_python-0.1.2.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for gcf_python-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a1c47959ac2a513b07bf7fb39088b29057b189b73f31f3c706b17694ffb00e38
MD5 7c4fd7d7a9a29d3518f23d919ee8d706
BLAKE2b-256 5b7c95914081f8c65b0538cb145ef102c234c7dc5e63e4f54a2d6892342b00d1

See more details on using hashes here.

File details

Details for the file gcf_python-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: gcf_python-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for gcf_python-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8787a8feb77240dc5a8d950bd3b497a7d241c163b1839ce1c88f7a652dca6465
MD5 ad7acbb96271061d932ee4c4586370b9
BLAKE2b-256 e123f28107b93651ee489bafe3213f2cb57397b9a63fed15a4e79f11a7bbf8d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page