Skip to main content

A powerful context reduction tool centered around converting data (JSON, YAML, XML, CSV) to the token-optimized TOON format for efficient LLM interactions.

Project description

๐Ÿš€ TOON Converter (Python)

License: MIT Python 3.10 | 3.11 | 3.12 | 3.13 LLM APIs cost reduction PyPI Downloads

A powerful context reduction tool centered around converting data (JSON, YAML, XML, CSV) to TOON (Token-Oriented Object Notation) format for efficient LLM interactions.

Reduce your LLM token costs by up to 40% using the TOON format!

pip install toon-parse

๐Ÿ› ๏ธ CLI Utility

Convert data and validate formats directly from your terminal using the unified toon-parse command.

Features

  • Streamlined: Single command for all conversion types.
  • Piping: Full support for stdin and stdout.
  • Validation: Standalone format validation logic.
  • Security: Built-in support for all encryption modes.

Usage Examples

# 1. Basic Conversion (JSON to TOON)
echo '{"name": "Alice"}' | toon-parse --from json --to toon

# 2. File-based Conversion with Async core
toon-parse --from xml --to json --input data.xml --output data.json --async

# 3. Secure Export (JSON to Encrypted XML)
toon-parse --from json --to xml --mode export --key <my_key> --input data.json

# 4. Format Validation
toon-parse --validate toon --input my_data.toon

๐Ÿ”„ Unified Format Converters

Beyond TOON, you can now convert directly between JSON, YAML, XML, and CSV using dedicated converter classes.

from toon_parse import JsonConverter, YamlConverter, XmlConverter, CsvConverter

# JSON <-> XML
xml_output = JsonConverter.to_xml({"user": "Alice"})
json_output = XmlConverter.to_json(xml_output)

# CSV <-> YAML
yaml_output = CsvConverter.to_yaml("id,name\n1,Alice")
csv_output = YamlConverter.to_csv(yaml_output)

Key Features

  • Direct Conversion: No need to convert to TOON first.
  • Mixed Text Support: from_json, from_xml, and from_csv methods automatically extract data from unstructured text.
  • Return Types:
    • JsonConverter.from_toon and from_yaml support return_json=True (default) to return a JSON string, or False to return a dict/list.
    • YamlConverter.to_json supports return_json=True (default) to return a JSON string.
    • All other methods return strings (formatted xml, csv, yaml, etc.).

๐Ÿ” Using Encryption with Unified Converters

All new converters support the secure middleware pattern. Use the instance-based approach:

from toon_parse import JsonConverter, Encryptor

# 1. Setup Encryptor
enc = Encryptor(algorithm='fernet', key=my_key)

# 2. Initialize Converter with Encryptor
converter = JsonConverter(encryptor=enc)

# 3. Convert with security mode
# Example: Decrypt input JSON -> convert to XML -> Encrypt output
encrypted_xml = converter.to_xml(
    encrypted_json_input, 
    conversion_mode="middleware"
)

๐Ÿš€ Quick Start

Basic Usage (Synchronous)

from toon_parse import ToonConverter

# 1. Python Object to TOON
data = {"name": "Alice", "age": 30, "active": True}
toon_string = ToonConverter.from_json(data)
print(toon_string)
# Output:
# name: "Alice"
# age: 30
# active: true

# 2. TOON to Python Object
json_output = ToonConverter.to_json(toon_string)
print(json_output)
# Output: {'name': 'Alice', 'age': 30, 'active': True}

Mixed Text Support

The library can automatically extract and convert JSON, XML, and CSV data embedded within normal text. This is perfect for processing LLM outputs.

from toon_parse import ToonConverter

# Text with embedded JSON
mixed_text = """
Here is the user profile you requested:
{
    "id": 101,
    "name": "Bob",
    "roles": ["admin", "editor"]
}
Please verify this information.
"""

# Automatically finds JSON, converts it to TOON, and preserves surrounding text
result = ToonConverter.from_json(mixed_text)
print(result)

# Output:
# Here is the user profile you requested:
# id: 101
# name: "Bob"
# roles[2]: "admin", "editor"
# Please verify this information.

๐Ÿง  Smart Code Optimization (New!)

The library includes an intelligent Data Manager that preprocesses input to handle code blocks efficiently.

  • Code Preservation: Code snippets (e.g., inside markdown backticks or detected via heuristics) are identified and protected from conversion logic.
  • Context Reduction: Code blocks are automatically optimized to reduce token usage by:
    • Removing comments (# ..., // ...).
    • Compressing double newlines to single newlines.
    • Stripping unnecessary whitespace.

This ensures that while your data is converted to TOON for efficiency, any embedded code remains syntactically valid but token-optimized.

๐Ÿ“‰ Context Optimization (Expensive Words) (New!)

The library automatically identifies and replaces common "expensive" phrases with token-efficient alternatives to reduce the overall payload size required for LLM input.

Note: While most alterations significantly reduce token count, some replacements may only reduce character count while keeping the token count the same. This still helps in reducing the API payload size (in bytes), which can reduce latency and cost for bandwidth-constrained environments.

Examples:

  • "large language model" โ†’ "llm"
  • "frequently asked questions" โ†’ "faq"
  • "as soon as possible" โ†’ "asap"
  • "do not" โ†’ "don't"
  • "I am" โ†’ "i'm"

This feature is case-insensitive and ensures that words inside code blocks and data blocks are NOT altered.

๐Ÿ” Secure Conversion Middleware

The ToonConverter can act as a secure middleware for processing encrypted data streams (e.g., from microservices). It handles the full Decrypt -> Convert -> Encrypt pipeline internally.

Supported Algorithms

  • Fernet: High security (AES-128). Requires cryptography.
  • XOR: Lightweight obfuscation.
  • Base64: Encoding only.

Conversion Modes

  1. "middleware": Encrypted Input โ†’ Encrypted Output (Decrypt โ†’ Convert โ†’ Re-encrypt)
  2. "ingestion": Encrypted Input โ†’ Plain Output (Decrypt โ†’ Convert)
  3. "export": Plain Input โ†’ Encrypted Output (Convert โ†’ Encrypt)
  4. "no_encryption": Standard conversion (default)

Example Workflow

from toon_parse import ToonConverter, Encryptor
from cryptography.fernet import Fernet

# Setup
key = Fernet.generate_key()
enc = Encryptor(key=key, algorithm='fernet')
converter = ToonConverter(encryptor=enc)

# --- Mode 1: Middleware (Encrypted -> Encrypted) ---
raw_data = '{"user": "Alice", "role": "admin"}'
encrypted_input = enc.encrypt(raw_data)  # Simulate upstream encrypted data

# Converter decrypts, converts to TOON, and re-encrypts
encrypted_toon = converter.from_json(
    encrypted_input, 
    conversion_mode="middleware"
)
print(f"Secure Result: {encrypted_toon}")

# --- Mode 2: Ingestion (Encrypted -> Plain) ---
plain_toon = converter.from_json(
    encrypted_input,
    conversion_mode="ingestion"
)
print(f"Decrypted TOON: {plain_toon}")

# --- Mode 3: Export (Plain -> Encrypted) ---
my_data = {"status": "ok"}
secure_packet = converter.from_json(
    my_data,
    conversion_mode="export"
)
print(f"Encrypted Output: {secure_packet}")

โšก Async Usage

For non-blocking operations in async applications (e.g., FastAPI), use AsyncToonConverter.

import asyncio
from toon_parse import AsyncToonConverter, Encryptor

async def main():
    # 1. Standard Async Usage
    converter = AsyncToonConverter()
    text = 'Data: <user><name>Alice</name></user>'
    toon = await converter.from_xml(text)
    print(toon)

    # 2. Async with Secure Middleware
    enc = Encryptor(algorithm='base64')
    secure_converter = AsyncToonConverter(encryptor=enc)
    
    # Decrypt -> Convert -> Encrypt (Middleware Mode)
    encrypted_msg = "eyJrZXkiOiAidmFsIn0=" # Base64 for {"key": "val"}
    
    # Use conversion_mode to specify pipeline behavior
    result = await secure_converter.from_json(
        encrypted_msg, 
        conversion_mode="middleware"
    )
    print(result)

asyncio.run(main())

๐Ÿ“š Features & Support

Feature JSON XML CSV YAML TOON
Python Dict/List Input โœ… N/A N/A N/A N/A
Pure String Input โœ… โœ… โœ… โœ… โœ…
Mixed Text Support โœ… โœ… โœ… โŒ โŒ
Async Support โœ… โœ… โœ… โœ… โœ…
Encryption Support โœ… โœ… โœ… โœ… โœ…
  • Mixed Text: Finds occurrences of data formats in text (JSON, XML, CSV) and converts them in-place.
  • Encryption: Supports Fernet, XOR, and Base64 middleware conversions.

โš™๏ธ Static vs Instance Usage

Conversion Methods (from_json, to_json, etc.)

All conversion methods support both static and instance calling patterns:

from toon_parse import ToonConverter

# โœ… Static Usage (No Encryption)
toon = ToonConverter.from_json({"key": "value"})

# โœ… Instance Usage (Encryption Supported)
converter = ToonConverter(encryptor=enc)
toon = converter.from_json({"key": "value"}, conversion_mode="export")

Important:

  • Static calls (ToonConverter.from_json(...)) work but cannot use encryption features.
  • Instance calls are required to use conversion_mode and encryption middleware.

The same applies to async methods.

Validate Method

The validate() method is strictly static and does not support encryption:

# โœ… Correct Usage
result = ToonConverter.validate('key: "value"')

# โŒ Will NOT work with encryption
converter = ToonConverter(encryptor=enc)
result = converter.validate(encrypted_data)  # No decryption happens!

Why? Validation returns a dictionary (not a string), which cannot be encrypted. If you need to validate encrypted data, decrypt it first manually:

decrypted = enc.decrypt(encrypted_toon)
result = ToonConverter.validate(decrypted)

The same applies to AsyncToonConverter.validate().

๐Ÿ›  API Reference

Core Converters

ToonConverter (Legacy & Easy Use)

  • Static & Instance.
  • Central hub for converting TOON <-> Any Format.

JsonConverter

  • Focus: JSON <-> Any Format.
  • from_toon(..., return_json=True)
  • from_yaml(..., return_json=True)
  • to_xml, to_csv, to_yaml, to_toon

YamlConverter

  • Focus: YAML <-> Any Format.
  • to_json(..., return_json=True)
  • from_json, from_xml, from_csv, from_toon

XmlConverter

  • Focus: XML <-> Any Format.
  • to_json (returns JSON string), from_json, etc.

CsvConverter

  • Focus: CSV <-> Any Format.
  • to_json (returns JSON string), from_json, etc.

Note: All to_csv methods return a string. If the input is nested JSON/Object, it will be automatically flattened (e.g., user.name) to fit the CSV format. Conversely, from_csv will unflatten dotted keys back into objects.

Async Converters

Mirroring the synchronous classes, we have:

  • AsyncJsonConverter
  • AsyncYamlConverter
  • AsyncXmlConverter
  • AsyncCsvConverter

Usage is identical, just use await.

Encryption

Encryptor

Constructor: Encryptor(key=None, algorithm='fernet')

  • algorithm: 'fernet' (default), 'xor', 'base64'.
  • key: Required for Fernet/XOR.
  • encrypt(data), decrypt(data): Helper methods.

Utility Functions

from toon_parse import extract_json_from_string, extract_xml_from_string, extract_csv_from_string
# Direct access to extraction logic without conversion

๐Ÿ“„ License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toon_parse-2.4.0.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toon_parse-2.4.0-py3-none-any.whl (41.4 kB view details)

Uploaded Python 3

File details

Details for the file toon_parse-2.4.0.tar.gz.

File metadata

  • Download URL: toon_parse-2.4.0.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.17

File hashes

Hashes for toon_parse-2.4.0.tar.gz
Algorithm Hash digest
SHA256 6c3786acd9f7088a339829132a7a9492bd65d88c3bd3114b241f565ef1bed1bc
MD5 21e4d3917bde06ceeaa80438f62e786f
BLAKE2b-256 b2aaf1c439486cdee2a7ffa03c51674120eb088ddb75ac246ddb4ad8e8147093

See more details on using hashes here.

File details

Details for the file toon_parse-2.4.0-py3-none-any.whl.

File metadata

  • Download URL: toon_parse-2.4.0-py3-none-any.whl
  • Upload date:
  • Size: 41.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.17

File hashes

Hashes for toon_parse-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 029b6b7f01611a4f65d5afb702d3002f6115e0e829f2075fb8be7218fbc35671
MD5 8dc5bbbe0b681296600f5cec7fa162c1
BLAKE2b-256 beff43817bd02ec7f0f251192998e58eb6bac192c89cde8ddad6e0cf75ed17b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page