Skip to main content

Terse JSON — compact binary JSON for microcontrollers

Project description

TSON — Terse JSON Binary Format

CI

A compact, schema-deduplicated binary format for JSON data, built for microcontrollers and constrained environments.

Core idea: in repetitive JSON (API payloads, telemetry, config), field names appear thousands of times. TSON stores them once in a definition block. Repeated strings are stored once in a dict block. The data stream is pure typed values, no key repetition, no duplicate strings.

JSON (890 bytes)               TSON binary (~374 bytes)
[{                              ┌── Header (13 B)
  "id": 1,                      │   version=1, def_off=13,
  "name": "Alice",              │   dict_off=110, data_off=122
  "age": 30,                    ├── Definition block (97 B)
  "address": {                  │   #0 Null  #1 Bool  #2 Int  #3 UInt
    "street": "123…",           │   #4 Float  #5 String
    "city": "Anytown",          │   #6 Array<String>
    "state": "CA",              │   #7 Object fields:
    "zip": "12345"              │      street:String city:String
  },                            │      state:String zip:String
  "hobbies": ["reading",        │   #8 Object fields:
    "hiking", "cooking"]        │      id:Int name:String age:Int
  },                            │      address:#7 hobbies:#6
  …                              │   #9 Array<Object>
]                               ├── Dict block (12 B, only
                                │   repeated strings)
                                ├── Data block (252 B)
                                │   Entry #9: 3 elements
                                │     [0]: #8 -> 1, "Alice", 30…
                                │     [1]: #8 -> 2, "Bob",   25…
                                │     [2]: #8 -> 3, "Charlie",35…
                                └── (end)

Features

  • Zero-dependency core: encode/decode/stream on &[u8] slices, only needs alloc.
  • Streaming reader: loads the tiny definition + dict blocks into memory, then yields data entries one-at-a-time - O(1) memory per entry.
  • Schema deduplication: identical object shapes share one definition. Field names stored once.
  • String interning (dict feature): repeated strings stored once in a dict block. StrRef points to them instead of repeating inline. Only strings that appear ≥2 times are included - no waste.
  • Hybrid string encoding: short strings (≤127 B) use 1-byte length, medium strings 2 bytes, long strings 4 bytes - saves space over flat u32.
  • no_std capable: disable the std feature for embedded targets - the core builds against alloc only (verified: cargo build --no-default-features).
  • Optional JSON bridge: serde_json-based compile/decompile behind the json feature.
  • Self-describing wire format: every compound value carries its definition index, enabling forward compatibility and partial decoding.

Install

# Rust (crates.io)
cargo add tson

# Python (PyPI) — distribution is `tson-bin`; you still `import tson`
pip install tson-bin

# Node.js (npm) — scoped package, ships a prebuilt addon per platform
npm install @siktec-lab/tson

Quick Start

// Round-trip a JSON string through TSON binary
let json = r#"{"name":"Alice","age":30}"#;

// JSON -> TSON document -> binary
let doc = tson::compile_json(json).unwrap();
let bytes = tson::to_bytes(&doc).unwrap();

// Binary -> TSON document -> JSON
let restored = tson::from_bytes(&bytes).unwrap();
let value = tson::decompile_to_value(&restored).unwrap();

assert_eq!(value.to_string(), r#"{"age":30,"name":"Alice"}"#);

Emit Mode (Bypass JSON)

Need TSON binary directly from structured data without parsing JSON? tson::emit() takes a TsonData tree and produces a complete TSON document.

use tson::{TsonData, emit};

// Build a sensor reading value tree directly
let reading = TsonData::Object(0, vec![
    TsonData::Float(22.5),                   // temperature
    TsonData::Int(61),                       // humidity
    TsonData::String("nominal".to_string()), // status
]);

// Emit as TSON binary - no JSON parse step
let bytes = emit(&reading).unwrap();

// Decode back
let doc = tson::from_bytes(&bytes).unwrap();
let value = tson::decompile_to_value(&doc).unwrap();
// value = {"f0": 22.5, "f1": 61, "f2": "nominal"}

Field names are synthetic ("f0", "f1", …) since TsonData values don't carry names. Definitions and the string dict are discovered automatically from the value tree.

Server Response Path - emit_with_context()

Reuse an incoming document's definitions and dict to emit a response - no schema re-discovery, no dict rebuild.

use tson::{TsonData, emit_with_context};

let response = TsonData::Object(6, vec![
    TsonData::String("processed".to_string()),
    TsonData::Int(42),
]);
let bytes = emit_with_context(&response, &incoming_defs, &incoming_dict).unwrap();

Field values must be in definition field order (alphabetical).

Direct Field Access - doc.get(), doc.index(), doc.get_by_index()

Extract values without decompiling to JSON. O(1) access when you pre-resolve field indices:

let doc = tson::compile_json(r#"{"name":"Alice","age":30}"#).unwrap();

// By name (linear scan)
let name = doc.get("name").unwrap();
let age = doc.get("age").unwrap();

// Or pre-resolve index for O(1) repeated access
let name_idx = doc.index("name").unwrap();
for _ in 0..1000 {
    let n = doc.get_by_index(name_idx).unwrap();
}

Multi-Document Stream - TsonDocReader

Read length-prefixed TSON documents from any byte source (archives, TCP streams).

use tson::stream::TsonDocReader;
use std::io::Cursor;

for doc in TsonDocReader::new(cursor) {
    println!("Defs: {}", doc.unwrap().definitions.len());
}

Each document is prefixed by a 4-byte LE length u32 followed by the TSON binary.

Command-Line Tool

# Build
cargo build --release

# Compile JSON -> TSON binary
./target/release/tson-cli data.json         # writes data.tson

# Decompile TSON -> pretty JSON
./target/release/tson-cli data.tson         # prints JSON to stdout

# Stream-debug (inspect header, defs, dict, entries)
./target/release/tson-cli -s data.tson

Feature Flags

Feature Default Description
std on Enables std::io::Read helpers and the IoError variant. Off -> no_std + alloc.
json on Enables serde_json-based compile_json / decompile_to_value. Off -> pure core.
dict on Enables string interning (dict block). Strings appearing ≥2 times get StrRef instead of inline copies. When off, all strings are emitted inline - reduces compile memory at the cost of larger output.
# All features (default)
cargo build

# Core only (no serde, no std, no dict)
cargo build --no-default-features

# Core + std (no JSON bridge, no dict)
cargo build --no-default-features --features std

# Without dict (all strings inline - less compile memory)
cargo build --no-default-features --features std,json

Architecture

┌──────────────────────────────────────────────────────┐
│  Public API  (tson.rs)                               │
│  to_bytes / from_bytes / compile_json / stream …     │
├──────────────────────────────────────────────────────┤
│  Encode          Decode          Stream              │
│  (encode.rs)     (decode.rs)     (stream.rs)          │
│  13B header      13B header     TsonStreamReader      │
│  hybrid strings  sentinel+StrRef dict available        │
├──────────────────────────────────────────────────────┤
│  Type System     (structure.rs, error.rs)             │
│  TsonType, TsonData::StrRef, TsonDocument::dict      │
├──────────────────────────────────────────────────────┤
│  JSON Bridge     (compile.rs, decompile.rs)           │
│  lazy-promotion dict, inline↔StrRef resolution       │
└──────────────────────────────────────────────────────┘

All core modules (structure, encode, decode, stream) operate on &[u8] slices with zero system dependencies beyond alloc. The JSON bridge (compile, decompile) is feature-gated behind #[cfg(feature = "json")].

Benchmark

The project includes two human-readable benchmark tools plus a Criterion harness.

tson-bench - Compression Summary

Scans examples/ for .json files, compiles each to TSON, reports compression ratios with dict size and leaf entry counts.

cargo run --release --bin tson-bench                 # compression table
cargo run --release --bin tson-bench -- --perf        # + p50/p99 timing
╔══════════════════════╤══════════╤══════════╤══════════╤══════════╤══════════╤═════════╗
║ File                 │ JSON (B) │ TSON (B) │   Ratio  │    Defs  │    Dict  │ Entries ║
╠══════════════════════╪══════════╪══════════╪══════════╪══════════╪══════════╪═════════╣
║ telemetry.json       │    54.4K │    16.2K │    29.8% │       11 │       63 │     500 ║
║ config.json          │    27.9K │     8.4K │    30.3% │       16 │       20 │       1 ║
║ 128KB.json           │   249.2K │   104.3K │    41.9% │        8 │      601 │     788 ║
║ users-t1.json        │    890 B │    374 B │    42.0% │       10 │        1 │       3 ║
╟──────────────────────┼──────────┼──────────┼──────────┼──────────┼──────────┼─────────╢
║ TOTAL                │   331.0K │   129.2K │    39.0% │          │          │         ║
╚══════════════════════╧══════════╧══════════╧══════════╧══════════╧══════════╧═════════╝

comp-bench - Detailed Performance Breakdown

Measures 7 independent workloads: JSON parse, compile, encode, decode, decompile, streaming read, and full round-trip.

cargo run --release --bin comp-bench                            # users-t1.json
cargo run --release --bin comp-bench -- examples/telemetry.json
╔══════════════════════╤══════════════╤══════════════════╗
║  Operation           │    avg / iter│   % of round-trip ║
╠══════════════════════╪══════════════╪══════════════════╣
║  serde_json parse    │     2641 ns  │  15%  (baseline)   ║
║  TSON compile        │     8098 ns  │  46%               ║
║  TSON encode         │      453 ns  │   3%   (cheapest!) ║
║  TSON decode         │     2178 ns  │  12%               ║
║  TSON decompile      │     2035 ns  │  12%               ║
║  TSON stream (full)  │     2088 ns  │  12%               ║
╟──────────────────────┼──────────────┼──────────────────╢
║  Full round-trip     │    11987 ns  │  summed            ║
╚══════════════════════╧══════════════════════════════════╝

cargo bench - Criterion Micro-benchmarks

For statistically rigorous numbers (warmup, outlier detection), benches/core.rs measures compile/encode/decode/decompile/round-trip over examples/telemetry.json and examples/128KB.json:

cargo bench

Observations

  • Compile dominates (~46% of per-op time) - schema discovery + string interning + definition building.
  • Encode is the cheapest stage (~0.45µs) - values are appended directly into one shared output buffer, with no per-node allocation or copy.
  • Decode is competitive with JSON parse - cached definitions and O(1) index lookups.
  • Streaming loads defs+dict once, then yields entries without re-parsing.
  • Dict is empty for unique-only documents - lazy-promotion ensures no waste. Only strings appearing ≥2 times are included.
  • 70%+ savings on large repetitive telemetry (500 sensor readings with 6 repeated field names per reading).

Why TSON? Comparison with Other Formats

TSON occupies a unique position in the binary JSON landscape - it is neither a general-purpose serializer nor a schema-first code generator. It compiles JSON into a self-describing, compressed binary that is optimized for decoding on constrained devices.

Size Comparison

File JSON TSON Savings
telemetry.json (500 sensor readings) 54.4 KB 16.2 KB 70.2%
config.json (200 routing rules) 27.9 KB 8.4 KB 69.7%
128KB.json (mixed documents) 249.2 KB 104.3 KB 58.1%
iot-t2.json 1.3 KB 0.6 KB 49.1%
users-t1.json 890 B 374 B 58.0%

For repetitive structured data, TSON achieves 60-70% compression by deduplicating field names and interned strings. The larger and more repetitive the input, the better the ratio.

Format Comparison

Feature TSON MessagePack CBOR serde_json Protobuf FlatBuffers
Self-describing
Schema discovery ✅ auto ❌ hardcoded
String interning ✅ per-doc
Field-name dedup ✅ auto ❌ repeats keys
Streaming decode ✅ O(1) mem
no_std + alloc ❌ std ❌ std ❌ std
Zero-copy strings ✅ StrRef
Security caps ✅ built-in
Hybrid str lengths ✅ 1/2/4 B
Human-readable ❌ binary ❌ binary ❌ binary ✅ text

When to Use Each Format

Scenario Best Choice Why
Browser ↔ server REST API JSON Native support everywhere
General-purpose binary packing MessagePack Good libraries, no schema needed
IoT with constrained nodes CBOR RFC standard, concise encoding
High-performance RPC Protobuf Schema-first, fast, compact
Microcontroller receiving structured telemetry TSON No schema file, streaming, zero-copy strings
Embedded device with limited RAM TSON no_std + alloc, O(1) per-entry memory
Config files needing human readability JSON Text is still the universal interface

Key Insight

TSON trades compile time for decode efficiency. The compiler does the heavy lifting - discovering schemas, interning strings, building definitions - so that the decoder on a microcontroller can process data without allocating field names and strings. For a server compiling millions of telemetry packets, the compile cost is amortized. For the microcontroller decoding thousands of entries, the memory savings and allocation-free path are transformative.

Security

TSON prioritizes safe decoding of untrusted input. The reference implementation includes:

  • Bounds-checked reads: every byte access is guarded, no panics on malformed input.
  • OOM caps: entry count (1M max), definition count (2048 max), fields per object (256 max).
  • Recursion guard: nesting depth limited to 128 - prevents stack overflow from circular definitions.
  • UTF-8 validation: all string data is validated; invalid sequences are rejected.
  • Header validation: offsets checked for consistency before use (def ≥ 13, dict ≥ def, data ≥ dict).

See the Security Considerations section in TSON-FORMAT.md for full details.

Testing

Three language bindings, one make target each.

Language Command Tests
Rust make test-rust 48 unit + 3 doctests
Python make test-python 9 tests (round-trip, file I/O, emit, compression)
Node.js make test-node 8 tests (dumps/loads, file, emit, errors)
All make test Full cross-language suite

Quick Run

make help           # show all commands
make test-rust      # Rust only (always works)
make test-python    # requires: pip install maturin
make test-node      # requires: cd js && npm install
make test           # all three
make bench          # benchmarks

The Makefile builds the Python wheel (maturin) and the Node.js addon (napi-rs v3, via cd js && npm run build) automatically. Full reference:

make pre-push       # run every CI gate locally (fmt, clippy, features, test)
make fmt            # format code (rustfmt)
make clippy         # lint, warnings-as-errors (CI gate)
make features       # no_std / std / all-features build checks (CI gate)
make check          # cargo check --all-features
make build          # cargo build --release
make test           # run all tests
make bench          # run all benchmarks
make bench-size     # compression summary
make bench-perf     # detailed performance
make clean          # cargo clean
make all            # build everything (Rust + Python + Node)

Full Format Specification

See TSON-FORMAT.md for the complete binary wire protocol with byte-level examples and BNF grammar.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tson_bin-0.1.0-cp312-cp312-win_amd64.whl (178.2 kB view details)

Uploaded CPython 3.12Windows x86-64

tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (266.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl (284.7 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (298.7 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file tson_bin-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: tson_bin-0.1.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 178.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tson_bin-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 0aea24e828673e415df81a7f6ea1b774c87a2bb122824441940426e844e10e61
MD5 e5ff6cd186ca2c23bdf592e389e23c2d
BLAKE2b-256 de32a4afceba298c5fd1ce0a4098500104d40b5613738a9fdab65e8ac02d1451

See more details on using hashes here.

Provenance

The following attestation bundles were made for tson_bin-0.1.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on siktec-lab/tson

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 790c2e0212c2816632d790a528ffd4ab4a55338bbc4f97b029e7825043c5c929
MD5 e483634ba2919272d3a49acc1ab72d10
BLAKE2b-256 5253759451ad11aa280268516fa8f6380fb2a9d71b671363798020e0cf7d896a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on siktec-lab/tson

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2a406f0b0ca12ebad92cd9ae43033a72a9da7981d6bbaee6550b205f79449869
MD5 ff7599e1d39c5adedb0ff8e6f977b998
BLAKE2b-256 cd5d65009863208b98bfcb00e48994e4e40a84eed05913548d047f094873d195

See more details on using hashes here.

Provenance

The following attestation bundles were made for tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: release.yml on siktec-lab/tson

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8df599e40a00fe45625673e45d8df639707463d507da943b54e3a73fdef8ee78
MD5 7e36a7f06f7a8e906a926d77384cf0ee
BLAKE2b-256 421f0efafa2b5e989957ada5d069a5999ef7c6de704f9716ca74ce480b2a6ef0

See more details on using hashes here.

Provenance

The following attestation bundles were made for tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on siktec-lab/tson

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page