Python parser and serializer for TOON (Token-Oriented Object Notation) - Reduce LLM token usage by 30-60%
Project description
TOON Parser (Python)
A Python parser and serializer for TOON (Token-Oriented Object Notation), a compact data format designed to reduce LLM token consumption by 30-60% compared to JSON.
Installation
pip install simple-toon
Quick Start
Functional API (Recommended for simple use cases)
from toon_parser import parse, stringify
# Convert TOON to JSON
toon_data = """
users[2]{id,name,active}:
1,Alice,true
2,Bob,false
"""
json_data = parse(toon_data)
# Result: {"users": [{"id": 1, "name": "Alice", "active": true}, ...]}
# Convert JSON to TOON
json_obj = {
"users": [
{"id": 1, "name": "Alice", "active": True},
{"id": 2, "name": "Bob", "active": False}
]
}
toon_string = stringify(json_obj)
Object-Oriented API (Recommended for complex applications)
from toon_parser import ToonParser, ToonSerializer, ToonDocument
# Create configured parser
parser = ToonParser(advanced=True)
data = parser.parse(toon_string)
# Create configured serializer
serializer = ToonSerializer(advanced=True)
toon = serializer.stringify(data)
# Work with documents
doc = ToonDocument.from_file("data.toon")
active_users = doc.query("users", lambda u: u["active"])
doc.add_item("users", {"id": 99, "name": "New User"})
doc.save("updated.toon")
Advanced Features
Nested Objects
Automatically flatten and unflatten nested objects:
from toon_parser import stringify_advanced, parse_advanced
data = {
"users": [
{"id": 1, "name": "Alice", "address": {"city": "NYC", "zip": "10001"}},
{"id": 2, "name": "Bob", "address": {"city": "LA", "zip": "90001"}}
]
}
# Serializes with dot notation: users[2]{id,name,address.city,address.zip}:
toon = stringify_advanced(data)
# Parse restores nested structure
result = parse_advanced(toon)
Multiple Arrays
Handle multiple arrays in a single TOON document:
data = {
"users": [{"id": 1, "name": "Alice"}],
"products": [{"sku": "A001", "price": 19.99}]
}
toon = stringify_advanced(data)
# Both arrays in one document
parsed = parse_advanced(toon)
Streaming Parser & Serializer
Memory-efficient operations for large datasets:
from toon_parser import stream_parse, StreamingSerializer
# Streaming parser (read large files)
for array_name, items in stream_parse(large_toon_data):
print(f"Processing {array_name}: {len(items)} items")
for item in items:
process(item) # Process one at a time
# Streaming serializer (write large files)
with StreamingSerializer("output.toon") as writer:
writer.begin_array("users", ["id", "name", "email"])
for user in database.query_users(): # Stream from DB
writer.write_row([user.id, user.name, user.email])
writer.end_array()
Custom Configuration
from toon_parser import ToonConfig, stringify_advanced
config = ToonConfig(
separator="_", # Use underscore instead of dot
indent_size=4, # 4-space indentation
max_nesting_depth=5 # Maximum nesting levels
)
toon = stringify_advanced(data, config)
Schema Validation
Define and validate data schemas:
from toon_parser import Field, FieldType, Schema, infer_schema
# Define schema manually
schema = Schema("users", [
Field("id", FieldType.INTEGER),
Field("name", FieldType.STRING),
Field("email", FieldType.STRING, pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$"),
Field("age", FieldType.INTEGER, min_value=0, max_value=120)
])
# Validate data
schema.validate(data)
# Or infer schema from example data
schema = infer_schema(sample_data, "users")
File I/O
Read and write TOON files with optional validation:
from toon_parser import read_toon, write_toon, convert_json_to_toon
# Write TOON file with validation
write_toon(data, "output.toon", advanced=True, schema=schema)
# Read TOON file
data = read_toon("input.toon", advanced=True)
# Convert between formats
convert_json_to_toon("input.json", "output.toon")
# Batch convert directory
from toon_parser import batch_convert
batch_convert("json_files/", "toon_files/", from_format="json", to_format="toon")
# Get file statistics
from toon_parser import get_file_stats
stats = get_file_stats("data.toon")
print(f"Total items: {stats['total_items']}")
What is TOON?
TOON is a token-efficient serialization format optimized for LLM input. It combines:
- YAML-style indentation for nested objects
- CSV-style tabular layout for uniform arrays
- Explicit schema declarations with
[N]{field1,field2}headers
Performance
- 30-60% fewer tokens than JSON (up to 63% with nested objects)
- Lossless, deterministic round-trip conversion
- Optimized for uniform arrays (logs, user lists, analytics events)
- Streaming parser for memory-efficient processing of large files
Benchmarks
| Dataset | JSON Size | TOON Size | Savings |
|---|---|---|---|
| Simple arrays (50 items) | 3,536 chars | 1,362 chars | 61.5% |
| Nested objects (50 items) | 7,220 chars | 2,639 chars | 63.4% |
| Event data (10 items) | 845 bytes | 235 bytes | 72.2% |
| Multiple arrays | Varies | Varies | 30-60% |
API Reference
Functional API
Basic Functions:
parse(toon: str) -> Any- Parse TOON to JSONstringify(obj: Any) -> str- Serialize JSON to TOON
Advanced Functions:
parse_advanced(toon: str, config: ToonConfig) -> Any- Parse with nested object supportstringify_advanced(obj: Any, config: ToonConfig) -> str- Serialize with nested objectsstream_parse(toon: str) -> Iterator- Memory-efficient streaming parser
Schema Validation:
Schema(array_name, fields)- Define validation schemaField(name, field_type, **options)- Define field with constraintsinfer_schema(data, array_name)- Auto-generate schema from dataMultiSchema(schemas)- Validate multiple arrays
File I/O:
read_toon(path, advanced, schema)- Read and validate TOON filewrite_toon(data, path, advanced, schema)- Write and validate TOON fileconvert_json_to_toon(json_path, toon_path)- Convert JSON → TOONconvert_toon_to_json(toon_path, json_path)- Convert TOON → JSONbatch_convert(input_dir, output_dir)- Batch convert filesget_file_stats(path)- Analyze file statistics
Streaming:
StreamingSerializer(output)- Stream write large TOON filesstreaming_serializer(output)- Context manager for streamingstream_from_database(query_func, ...)- Stream from database to TOON
Object-Oriented API
Classes:
ToonParser(advanced, config, schema)- Stateful parserToonSerializer(advanced, config, schema)- Stateful serializerToonDocument(data)- Document object model with query/manipulation methodsToonConverter(advanced, config)- Format converter with statistics
Examples
See the example files for detailed usage:
example.py- Basic parsing and serialization (functional API)example_advanced.py- Nested objects, multiple arrays, configurationexample_schema_io.py- Schema validation and file I/Oexample_oo_streaming.py- Object-oriented API and streaming serializer
Development
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=toon_parser
# Format code
black toon_parser/ tests/
# Lint
ruff check toon_parser/ tests/
# Type check
mypy toon_parser/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_toon-0.2.1.tar.gz.
File metadata
- Download URL: simple_toon-0.2.1.tar.gz
- Upload date:
- Size: 26.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ae655405cf27c7a935a93b90087fbc96356bd871cc8316d04a9817747774329
|
|
| MD5 |
a805a2047435eb6d870930ec56f9d77c
|
|
| BLAKE2b-256 |
74e1a1c27feeef82db85261a16961f7a30de81efcdf13ea4fb94b3c1e766c3c7
|
File details
Details for the file simple_toon-0.2.1-py3-none-any.whl.
File metadata
- Download URL: simple_toon-0.2.1-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ca4ab0168f7a279d268ec6d936dfda9326c289ee7cc574779885a36b15c8da6
|
|
| MD5 |
7301d6b2a4c126a4b54eebc2b7753f45
|
|
| BLAKE2b-256 |
539268a3a5908210c6ee9649e160ad955156a41a8bdc09fbe2f902f4306058a4
|