Terse JSON — compact binary JSON for microcontrollers
Project description
TSON — Terse JSON Binary Format
A compact, schema-deduplicated binary format for JSON data, built for microcontrollers and constrained environments.
Core idea: in repetitive JSON (API payloads, telemetry, config), field names appear thousands of times. TSON stores them once in a definition block. Repeated strings are stored once in a dict block. The data stream is pure typed values, no key repetition, no duplicate strings.
JSON (890 bytes) TSON binary (~374 bytes)
[{ ┌── Header (13 B)
"id": 1, │ version=1, def_off=13,
"name": "Alice", │ dict_off=110, data_off=122
"age": 30, ├── Definition block (97 B)
"address": { │ #0 Null #1 Bool #2 Int #3 UInt
"street": "123…", │ #4 Float #5 String
"city": "Anytown", │ #6 Array<String>
"state": "CA", │ #7 Object fields:
"zip": "12345" │ street:String city:String
}, │ state:String zip:String
"hobbies": ["reading", │ #8 Object fields:
"hiking", "cooking"] │ id:Int name:String age:Int
}, │ address:#7 hobbies:#6
… │ #9 Array<Object>
] ├── Dict block (12 B, only
│ repeated strings)
├── Data block (252 B)
│ Entry #9: 3 elements
│ [0]: #8 -> 1, "Alice", 30…
│ [1]: #8 -> 2, "Bob", 25…
│ [2]: #8 -> 3, "Charlie",35…
└── (end)
Features
- Zero-dependency core: encode/decode/stream on
&[u8]slices, only needsalloc. - Streaming reader: loads the tiny definition + dict blocks into memory, then yields data entries one-at-a-time -
O(1)memory per entry. - Schema deduplication: identical object shapes share one definition. Field names stored once.
- String interning (
dictfeature): repeated strings stored once in a dict block.StrRefpoints to them instead of repeating inline. Only strings that appear ≥2 times are included - no waste. - Hybrid string encoding: short strings (≤127 B) use 1-byte length, medium strings 2 bytes, long strings 4 bytes - saves space over flat u32.
no_stdcapable: disable thestdfeature for embedded targets - the core builds againstalloconly (verified:cargo build --no-default-features).- Optional JSON bridge:
serde_json-based compile/decompile behind thejsonfeature. - Self-describing wire format: every compound value carries its definition index, enabling forward compatibility and partial decoding.
Install
# Rust (crates.io)
cargo add tson
# Python (PyPI) — distribution is `tson-bin`; you still `import tson`
pip install tson-bin
# Node.js (npm) — scoped package, ships a prebuilt addon per platform
npm install @siktec-lab/tson
Quick Start
// Round-trip a JSON string through TSON binary
let json = r#"{"name":"Alice","age":30}"#;
// JSON -> TSON document -> binary
let doc = tson::compile_json(json).unwrap();
let bytes = tson::to_bytes(&doc).unwrap();
// Binary -> TSON document -> JSON
let restored = tson::from_bytes(&bytes).unwrap();
let value = tson::decompile_to_value(&restored).unwrap();
assert_eq!(value.to_string(), r#"{"age":30,"name":"Alice"}"#);
Emit Mode (Bypass JSON)
Need TSON binary directly from structured data without parsing JSON? tson::emit() takes a TsonData tree and produces a complete TSON document.
use tson::{TsonData, emit};
// Build a sensor reading value tree directly
let reading = TsonData::Object(0, vec![
TsonData::Float(22.5), // temperature
TsonData::Int(61), // humidity
TsonData::String("nominal".to_string()), // status
]);
// Emit as TSON binary - no JSON parse step
let bytes = emit(&reading).unwrap();
// Decode back
let doc = tson::from_bytes(&bytes).unwrap();
let value = tson::decompile_to_value(&doc).unwrap();
// value = {"f0": 22.5, "f1": 61, "f2": "nominal"}
Field names are synthetic ("f0", "f1", …) since TsonData values don't carry names. Definitions and the string dict are discovered automatically from the value tree.
Server Response Path - emit_with_context()
Reuse an incoming document's definitions and dict to emit a response - no schema re-discovery, no dict rebuild.
use tson::{TsonData, emit_with_context};
let response = TsonData::Object(6, vec![
TsonData::String("processed".to_string()),
TsonData::Int(42),
]);
let bytes = emit_with_context(&response, &incoming_defs, &incoming_dict).unwrap();
Field values must be in definition field order (alphabetical).
Direct Field Access - doc.get(), doc.index(), doc.get_by_index()
Extract values without decompiling to JSON. O(1) access when you pre-resolve field indices:
let doc = tson::compile_json(r#"{"name":"Alice","age":30}"#).unwrap();
// By name (linear scan)
let name = doc.get("name").unwrap();
let age = doc.get("age").unwrap();
// Or pre-resolve index for O(1) repeated access
let name_idx = doc.index("name").unwrap();
for _ in 0..1000 {
let n = doc.get_by_index(name_idx).unwrap();
}
Multi-Document Stream - TsonDocReader
Read length-prefixed TSON documents from any byte source (archives, TCP streams).
use tson::stream::TsonDocReader;
use std::io::Cursor;
for doc in TsonDocReader::new(cursor) {
println!("Defs: {}", doc.unwrap().definitions.len());
}
Each document is prefixed by a 4-byte LE length u32 followed by the TSON binary.
Command-Line Tool
# Build
cargo build --release
# Compile JSON -> TSON binary
./target/release/tson-cli data.json # writes data.tson
# Decompile TSON -> pretty JSON
./target/release/tson-cli data.tson # prints JSON to stdout
# Stream-debug (inspect header, defs, dict, entries)
./target/release/tson-cli -s data.tson
Feature Flags
| Feature | Default | Description |
|---|---|---|
std |
on | Enables std::io::Read helpers and the IoError variant. Off -> no_std + alloc. |
json |
on | Enables serde_json-based compile_json / decompile_to_value. Off -> pure core. |
dict |
on | Enables string interning (dict block). Strings appearing ≥2 times get StrRef instead of inline copies. When off, all strings are emitted inline - reduces compile memory at the cost of larger output. |
# All features (default)
cargo build
# Core only (no serde, no std, no dict)
cargo build --no-default-features
# Core + std (no JSON bridge, no dict)
cargo build --no-default-features --features std
# Without dict (all strings inline - less compile memory)
cargo build --no-default-features --features std,json
Architecture
┌──────────────────────────────────────────────────────┐
│ Public API (tson.rs) │
│ to_bytes / from_bytes / compile_json / stream … │
├──────────────────────────────────────────────────────┤
│ Encode Decode Stream │
│ (encode.rs) (decode.rs) (stream.rs) │
│ 13B header 13B header TsonStreamReader │
│ hybrid strings sentinel+StrRef dict available │
├──────────────────────────────────────────────────────┤
│ Type System (structure.rs, error.rs) │
│ TsonType, TsonData::StrRef, TsonDocument::dict │
├──────────────────────────────────────────────────────┤
│ JSON Bridge (compile.rs, decompile.rs) │
│ lazy-promotion dict, inline↔StrRef resolution │
└──────────────────────────────────────────────────────┘
All core modules (structure, encode, decode, stream) operate on &[u8] slices with zero system dependencies beyond alloc. The JSON bridge (compile, decompile) is feature-gated behind #[cfg(feature = "json")].
Benchmark
The project includes two human-readable benchmark tools plus a Criterion harness.
tson-bench - Compression Summary
Scans examples/ for .json files, compiles each to TSON, reports compression ratios with dict size and leaf entry counts.
cargo run --release --bin tson-bench # compression table
cargo run --release --bin tson-bench -- --perf # + p50/p99 timing
╔══════════════════════╤══════════╤══════════╤══════════╤══════════╤══════════╤═════════╗
║ File │ JSON (B) │ TSON (B) │ Ratio │ Defs │ Dict │ Entries ║
╠══════════════════════╪══════════╪══════════╪══════════╪══════════╪══════════╪═════════╣
║ telemetry.json │ 54.4K │ 16.2K │ 29.8% │ 11 │ 63 │ 500 ║
║ config.json │ 27.9K │ 8.4K │ 30.3% │ 16 │ 20 │ 1 ║
║ 128KB.json │ 249.2K │ 104.3K │ 41.9% │ 8 │ 601 │ 788 ║
║ users-t1.json │ 890 B │ 374 B │ 42.0% │ 10 │ 1 │ 3 ║
╟──────────────────────┼──────────┼──────────┼──────────┼──────────┼──────────┼─────────╢
║ TOTAL │ 331.0K │ 129.2K │ 39.0% │ │ │ ║
╚══════════════════════╧══════════╧══════════╧══════════╧══════════╧══════════╧═════════╝
comp-bench - Detailed Performance Breakdown
Measures 7 independent workloads: JSON parse, compile, encode, decode, decompile, streaming read, and full round-trip.
cargo run --release --bin comp-bench # users-t1.json
cargo run --release --bin comp-bench -- examples/telemetry.json
╔══════════════════════╤══════════════╤══════════════════╗
║ Operation │ avg / iter│ % of round-trip ║
╠══════════════════════╪══════════════╪══════════════════╣
║ serde_json parse │ 2641 ns │ 15% (baseline) ║
║ TSON compile │ 8098 ns │ 46% ║
║ TSON encode │ 453 ns │ 3% (cheapest!) ║
║ TSON decode │ 2178 ns │ 12% ║
║ TSON decompile │ 2035 ns │ 12% ║
║ TSON stream (full) │ 2088 ns │ 12% ║
╟──────────────────────┼──────────────┼──────────────────╢
║ Full round-trip │ 11987 ns │ summed ║
╚══════════════════════╧══════════════════════════════════╝
cargo bench - Criterion Micro-benchmarks
For statistically rigorous numbers (warmup, outlier detection), benches/core.rs
measures compile/encode/decode/decompile/round-trip over examples/telemetry.json
and examples/128KB.json:
cargo bench
Observations
- Compile dominates (~46% of per-op time) - schema discovery + string interning + definition building.
- Encode is the cheapest stage (~0.45µs) - values are appended directly into one shared output buffer, with no per-node allocation or copy.
- Decode is competitive with JSON parse - cached definitions and O(1) index lookups.
- Streaming loads defs+dict once, then yields entries without re-parsing.
- Dict is empty for unique-only documents - lazy-promotion ensures no waste. Only strings appearing ≥2 times are included.
- 70%+ savings on large repetitive telemetry (500 sensor readings with 6 repeated field names per reading).
Why TSON? Comparison with Other Formats
TSON occupies a unique position in the binary JSON landscape - it is neither a general-purpose serializer nor a schema-first code generator. It compiles JSON into a self-describing, compressed binary that is optimized for decoding on constrained devices.
Size Comparison
| File | JSON | TSON | Savings |
|---|---|---|---|
telemetry.json (500 sensor readings) |
54.4 KB | 16.2 KB | 70.2% |
config.json (200 routing rules) |
27.9 KB | 8.4 KB | 69.7% |
128KB.json (mixed documents) |
249.2 KB | 104.3 KB | 58.1% |
iot-t2.json |
1.3 KB | 0.6 KB | 49.1% |
users-t1.json |
890 B | 374 B | 58.0% |
For repetitive structured data, TSON achieves 60-70% compression by deduplicating field names and interned strings. The larger and more repetitive the input, the better the ratio.
Format Comparison
| Feature | TSON | MessagePack | CBOR | serde_json | Protobuf | FlatBuffers |
|---|---|---|---|---|---|---|
| Self-describing | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| Schema discovery | ✅ auto | ❌ | ❌ | ❌ | ❌ hardcoded | ❌ |
| String interning | ✅ per-doc | ❌ | ❌ | ❌ | ❌ | ❌ |
| Field-name dedup | ✅ auto | ❌ repeats keys | ❌ | ❌ | ❌ | ❌ |
| Streaming decode | ✅ O(1) mem | ❌ | ❌ | ❌ | ❌ | ✅ |
| no_std + alloc | ✅ | ❌ std | ❌ std | ❌ std | ❌ | ❌ |
| Zero-copy strings | ✅ StrRef | ❌ | ❌ | ❌ | ❌ | ✅ |
| Security caps | ✅ built-in | ❌ | ❌ | ❌ | ❌ | ❌ |
| Hybrid str lengths | ✅ 1/2/4 B | ❌ | ❌ | ❌ | ❌ | ❌ |
| Human-readable | ❌ binary | ❌ binary | ❌ binary | ✅ text | ❌ | ❌ |
When to Use Each Format
| Scenario | Best Choice | Why |
|---|---|---|
| Browser ↔ server REST API | JSON | Native support everywhere |
| General-purpose binary packing | MessagePack | Good libraries, no schema needed |
| IoT with constrained nodes | CBOR | RFC standard, concise encoding |
| High-performance RPC | Protobuf | Schema-first, fast, compact |
| Microcontroller receiving structured telemetry | TSON | No schema file, streaming, zero-copy strings |
| Embedded device with limited RAM | TSON | no_std + alloc, O(1) per-entry memory |
| Config files needing human readability | JSON | Text is still the universal interface |
Key Insight
TSON trades compile time for decode efficiency. The compiler does the heavy lifting - discovering schemas, interning strings, building definitions - so that the decoder on a microcontroller can process data without allocating field names and strings. For a server compiling millions of telemetry packets, the compile cost is amortized. For the microcontroller decoding thousands of entries, the memory savings and allocation-free path are transformative.
Security
TSON prioritizes safe decoding of untrusted input. The reference implementation includes:
- Bounds-checked reads: every byte access is guarded, no panics on malformed input.
- OOM caps: entry count (1M max), definition count (2048 max), fields per object (256 max).
- Recursion guard: nesting depth limited to 128 - prevents stack overflow from circular definitions.
- UTF-8 validation: all string data is validated; invalid sequences are rejected.
- Header validation: offsets checked for consistency before use (def ≥ 13, dict ≥ def, data ≥ dict).
See the Security Considerations section in TSON-FORMAT.md for full details.
Testing
Three language bindings, one make target each.
| Language | Command | Tests |
|---|---|---|
| Rust | make test-rust |
48 unit + 3 doctests |
| Python | make test-python |
9 tests (round-trip, file I/O, emit, compression) |
| Node.js | make test-node |
8 tests (dumps/loads, file, emit, errors) |
| All | make test |
Full cross-language suite |
Quick Run
make help # show all commands
make test-rust # Rust only (always works)
make test-python # requires: pip install maturin
make test-node # requires: cd js && npm install
make test # all three
make bench # benchmarks
The Makefile builds the Python wheel (maturin) and the Node.js addon
(napi-rs v3, via cd js && npm run build) automatically. Full reference:
make pre-push # run every CI gate locally (fmt, clippy, features, test)
make fmt # format code (rustfmt)
make clippy # lint, warnings-as-errors (CI gate)
make features # no_std / std / all-features build checks (CI gate)
make check # cargo check --all-features
make build # cargo build --release
make test # run all tests
make bench # run all benchmarks
make bench-size # compression summary
make bench-perf # detailed performance
make clean # cargo clean
make all # build everything (Rust + Python + Node)
Full Format Specification
See TSON-FORMAT.md for the complete binary wire protocol with byte-level examples and BNF grammar.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tson_bin-0.1.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: tson_bin-0.1.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 178.2 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0aea24e828673e415df81a7f6ea1b774c87a2bb122824441940426e844e10e61
|
|
| MD5 |
e5ff6cd186ca2c23bdf592e389e23c2d
|
|
| BLAKE2b-256 |
de32a4afceba298c5fd1ce0a4098500104d40b5613738a9fdab65e8ac02d1451
|
Provenance
The following attestation bundles were made for tson_bin-0.1.0-cp312-cp312-win_amd64.whl:
Publisher:
release.yml on siktec-lab/tson
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tson_bin-0.1.0-cp312-cp312-win_amd64.whl -
Subject digest:
0aea24e828673e415df81a7f6ea1b774c87a2bb122824441940426e844e10e61 - Sigstore transparency entry: 1906495987
- Sigstore integration time:
-
Permalink:
siktec-lab/tson@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/siktec-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 266.1 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
790c2e0212c2816632d790a528ffd4ab4a55338bbc4f97b029e7825043c5c929
|
|
| MD5 |
e483634ba2919272d3a49acc1ab72d10
|
|
| BLAKE2b-256 |
5253759451ad11aa280268516fa8f6380fb2a9d71b671363798020e0cf7d896a
|
Provenance
The following attestation bundles were made for tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
release.yml on siktec-lab/tson
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tson_bin-0.1.0-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
790c2e0212c2816632d790a528ffd4ab4a55338bbc4f97b029e7825043c5c929 - Sigstore transparency entry: 1906495643
- Sigstore integration time:
-
Permalink:
siktec-lab/tson@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/siktec-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 284.7 kB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a406f0b0ca12ebad92cd9ae43033a72a9da7981d6bbaee6550b205f79449869
|
|
| MD5 |
ff7599e1d39c5adedb0ff8e6f977b998
|
|
| BLAKE2b-256 |
cd5d65009863208b98bfcb00e48994e4e40a84eed05913548d047f094873d195
|
Provenance
The following attestation bundles were made for tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl:
Publisher:
release.yml on siktec-lab/tson
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tson_bin-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl -
Subject digest:
2a406f0b0ca12ebad92cd9ae43033a72a9da7981d6bbaee6550b205f79449869 - Sigstore transparency entry: 1906495752
- Sigstore integration time:
-
Permalink:
siktec-lab/tson@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/siktec-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 298.7 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8df599e40a00fe45625673e45d8df639707463d507da943b54e3a73fdef8ee78
|
|
| MD5 |
7e36a7f06f7a8e906a926d77384cf0ee
|
|
| BLAKE2b-256 |
421f0efafa2b5e989957ada5d069a5999ef7c6de704f9716ca74ce480b2a6ef0
|
Provenance
The following attestation bundles were made for tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on siktec-lab/tson
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tson_bin-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
8df599e40a00fe45625673e45d8df639707463d507da943b54e3a73fdef8ee78 - Sigstore transparency entry: 1906495850
- Sigstore integration time:
-
Permalink:
siktec-lab/tson@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/siktec-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8a5c71a9fbc28a61c5da75b91e1f757fc77fb824 -
Trigger Event:
push
-
Statement type: