Skip to main content

Budget‑constrained JSON preview renderer (Python bindings)

Project description

headson

Terminal demo

heal/tail for JSON, YAML - but structure‑aware. Get a compact preview that shows both the shape and representative values of your data, all within a strict byte budget. (Just like head/tail, headson can also work with unstructured text files.)

Available as:

Codecov Crates.io Version PyPI - Version

Install

Using Cargo:

cargo install headson

From source:

cargo build --release
target/release/headson --help

Features

  • Budgeted output: specify exactly how much you want to see
  • Output formats: auto | json | yaml | text
    • Styles: strict | default | detailed
      • JSON family: strict → strict JSON, default → human‑friendly Pseudo, detailed → JS with inline comments
      • YAML: always YAML; strict has no comments, default uses “# …”, detailed uses “# N more …”
      • Text: prints raw lines. In default style, omissions are shown as a single line ; in detailed, as … N more lines …. strict omits array‑level summaries.
  • Multiple inputs: preview many files at once with a shared or per‑file budget
  • Fast: processes gigabyte‑scale files in seconds (mostly disk‑bound)
  • Available as a CLI app and as a Python library

Fits into command line workflows

If you’re comfortable with tools like head and tail, use headson when you want a quick, structured peek into a JSON file without dumping the entire thing.

  • head/tail operate on bytes/lines - their output is not optimized for tree structures
  • jq you need to craft filters to preview large JSON files
  • headson is like head/tail for trees: zero config but it keeps structure and represents content as much as possible

Usage

headson [FLAGS] [INPUT...]
  • INPUT (optional, repeatable): file path(s). If omitted, reads from stdin. Multiple input files are supported.
  • Prints the preview to stdout. On parse errors, exits non‑zero and prints an error to stderr.

Common flags:

  • -c, --bytes <BYTES>: per‑file output budget. For multiple inputs, default total budget is <BYTES> * number_of_inputs.
  • -C, --global-bytes <BYTES>: total output budget across all inputs. With --bytes, the effective total is the smaller of the two.
  • -f, --format <auto|json|yaml|text>: output format (default: auto).
    • Auto: stdin → JSON family; filesets → per‑file based on extension (.json → JSON family, .yaml/.yml → YAML, unknown → Text).
  • -t, --template <strict|default|detailed>: output style (default: default).
    • JSON family: strict → strict JSON; default → Pseudo; detailed → JS with inline comments.
    • YAML: always YAML; style only affects comments (strict none, default “# …”, detailed “# N more …”).
  • -i, --input-format <json|yaml|text>: ingestion format (default: json). For filesets in auto format, ingestion is chosen by extensions.
  • -m, --compact: no indentation, no spaces, no newlines
  • --no-newline: single line output
  • --no-space: no space after : in objects
  • --indent <STR>: indentation unit (default: two spaces)
  • --string-cap <N>: max graphemes to consider per string (default: 500)
  • --head: prefer the beginning of arrays when truncating (keep first N). Strings are unaffected. Display styles place omission markers accordingly; strict JSON remains unannotated. Mutually exclusive with --tail.
  • --tail: prefer the end of arrays when truncating (keep last N). Strings are unaffected. Display styles place omission markers accordingly; strict JSON remains unannotated. Mutually exclusive with --head.

Notes:

  • Multiple inputs:
    • With newlines enabled, file sections are rendered with human‑readable headers. In compact/single‑line modes, headers are omitted.
  • In --format auto, each file uses its own best format: JSON family for .json, YAML for .yaml/.yml.
    • Unknown extensions are treated as Text (raw lines) — safe for logs and .txt files.
    • --global-bytes may truncate or omit entire files to respect the total budget.
    • The tool finds the largest preview that fits the budget; even if extremely tight, you still get a minimal, valid preview.
    • Directories and binary files are ignored; a notice is printed to stderr for each. Stdin reads the stream as‑is.
    • Head vs Tail sampling: these options bias which part of arrays are kept before rendering. Display styles may still insert internal gap markers to honor very small budgets; strict JSON stays unannotated.

Quick one‑liners:

  • Peek a big JSON stream (keeps structure):

    zstdcat huge.json.zst | headson -c 800 -f json -t default
    
  • Many files with a fixed overall size:

    headson -C 1200 -f json -t strict logs/*.json
    
  • Glance at a file, JavaScript‑style comments for omissions:

    headson -c 400 -f json -t detailed data.json
    
  • YAML with detailed comments:

    headson -c 400 -f yaml -t detailed config.yaml
    

Text mode

  • Single file (auto):

    headson -c 200 notes.txt
    
  • Force Text ingest/output (useful when mixing with other extensions):

    headson -c 200 -i text -f text notes.txt
    
  • Many text files (fileset):

    headson -c 800 -i text -f text logs/*.txt
    
  • Styles on Text:

    • default: omission as a standalone line.
    • detailed: omission as … N more lines ….
    • strict: no array‑level omission line (individual long lines may still truncate with ).

Show help:

headson --help

Note: flags align with head/tail conventions (-c/--bytes, -C/--global-bytes).

Examples: head vs headson

Input:

{"users":[{"id":1,"name":"Ana","roles":["admin","dev"]},{"id":2,"name":"Bo"}],"meta":{"count":2,"source":"db"}}

Naive cut (can break mid‑token):

jq -c . users.json | head -c 80
# {"users":[{"id":1,"name":"Ana","roles":["admin","dev"]},{"id":2,"name":"Bo"}],"me

Structured preview with headson (JSON family, default style → Pseudo):

headson -c 120 -f json -t default users.json
# {
#   users: [
#     { id: 1, name: "Ana", roles: [ "admin", … ] },
#     …
#   ]
#   meta: { count: 2, … }
# }

Machine‑readable preview (JSON family, strict style → strict JSON):

headson -c 120 -f json -t strict users.json
# {"users":[{"id":1,"name":"Ana","roles":["admin"]}],"meta":{"count":2}}

Terminal Demos

Regenerate locally:

  • Place tapes under docs/tapes (e.g., docs/tapes/demo.tape)
  • Run: cargo make tapes
  • Outputs are written to docs/assets/tapes

Python Bindings

A thin Python extension module is available on PyPI as headson.

  • Install: pip install headson (ABI3 wheels for Python 3.10+ on Linux/macOS/Windows).
  • API:
    • headson.summarize(text: str, *, format: str = "auto", style: str = "default", input_format: str = "json", byte_budget: int | None = None, skew: str = "balanced") -> str
      • format: "auto" | "json" | "yaml" (auto maps to JSON family for single inputs)
      • style: "strict" | "default" | "detailed"
      • input_format: "json" | "yaml" (ingestion)
      • byte_budget: maximum output size in bytes (default: 500)
      • skew: "balanced" | "head" | "tail" (affects display styles; strict JSON remains unannotated)

Examples:

import json
import headson

data = {"foo": [1, 2, 3], "bar": {"x": "y"}}
preview = headson.summarize(json.dumps(data), format="json", style="strict", byte_budget=200)
print(preview)

# Prefer the tail of arrays (annotations show with style="default"/"detailed")
print(
    headson.summarize(
        json.dumps(list(range(100))),
        format="json",
        style="detailed",
        byte_budget=80,
        skew="tail",
    )
)

# YAML support
doc = "root:\n  items: [1,2,3,4,5,6,7,8,9,10]\n"
print(headson.summarize(doc, format="yaml", style="default", input_format="yaml", byte_budget=60))

Algorithm

Algorithm overview

Footnotes

  • [1] Optimized tree representation: An arena‑style tree stored in flat, contiguous buffers. Each node records its kind and value plus index ranges into shared child and key arrays. Arrays are ingested in a single pass and may be deterministically pre‑sampled: the first element is always kept; additional elements are selected via a fixed per‑index inclusion test; for kept elements, original indices are stored and full lengths are counted. This enables accurate omission info and internal gap markers later, while minimizing pointer chasing.
  • [2] Priority order: Nodes are scored so previews surface representative structure and values first. Arrays can favor head/mid/tail coverage (default) or strictly the head; tail preference flips head/tail when configured. Object properties are ordered by key, and strings expand by grapheme with early characters prioritized over very deep expansions.
  • [3] Choose top N nodes (binary search): Iteratively picks N so that the rendered preview fits within the byte budget, looping between “choose N” and a render attempt to converge quickly.
  • [4] Render attempt: Serializes the currently included nodes using the selected template. Omission summaries and per-file section headers appear in display templates (pseudo/js); json remains strict. For arrays, display templates may insert internal gap markers between non‑contiguous kept items using original indices.
  • [5] Diagram source: The Algorithm diagram is generated from docs/diagrams/algorithm.mmd. Regenerate the SVG with cargo make diagrams before releasing.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headson-0.6.5.tar.gz (5.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

headson-0.6.5-cp310-abi3-win_amd64.whl (358.8 kB view details)

Uploaded CPython 3.10+Windows x86-64

headson-0.6.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (470.1 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

headson-0.6.5-cp310-abi3-macosx_11_0_arm64.whl (403.9 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file headson-0.6.5.tar.gz.

File metadata

  • Download URL: headson-0.6.5.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.6

File hashes

Hashes for headson-0.6.5.tar.gz
Algorithm Hash digest
SHA256 178b2742becf149311c60ca7471b4157a483ba7a54ef7a3d904432d9dfcb6559
MD5 2f2d841e22e21c890e7e7f7dc19367f8
BLAKE2b-256 5da14f3e0f6b590a37084d08fab99f6229cd1da2457ae0004bb4f0f8cd5434b5

See more details on using hashes here.

File details

Details for the file headson-0.6.5-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: headson-0.6.5-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 358.8 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.6

File hashes

Hashes for headson-0.6.5-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5c2055865f2b8330525a68e446a3852645cfe07d5379d0ae4a45a4378ae17700
MD5 d1c5edfc6b22654c1fed9172db6daa42
BLAKE2b-256 5e9188dbf3d2fb99e9d148b2b5634f39c5689671270566dd458dd8f6b5378ada

See more details on using hashes here.

File details

Details for the file headson-0.6.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for headson-0.6.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e6c2572c5b328bd2e900765085f1e7a946ae9e192bbd1729de5ab65b639b3e3e
MD5 4d5ff500e64819104cf66ed014e8c44d
BLAKE2b-256 0076eb84e37faf06cdb721cf19ab15d548b954b98916dbcda2f65f1d072db684

See more details on using hashes here.

File details

Details for the file headson-0.6.5-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for headson-0.6.5-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 200dbaf9e1329642a8a24cb1a3579b853ddb1c895c41a4ae64c94137a651d036
MD5 b2067ce1be26da63e98b53ffc848c4fd
BLAKE2b-256 e4ed6db0c726f14efc955974bf6c2c581cc2831a032c29dd9398f395f69850c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page