Skip to main content

Arkadia Data Format (AK-DATA) - A versatile data serialization format optimized for AI applications.

Project description

Arkadia Data Format (AKD)

                                   ; i  :J                                      
                               U, .j..fraaM.  nl                                
                            b h.obWMkkWWMMWMCdkvz,k                             
                         ! .mQWM:o hiMoMW v.uaXMdohbi                           
                        hI,MMmaIao.Wo .IMkoh FCMwqoXa                           
                      ,.c.aWdM. d,aToW  .    Mb!. MopfQ.L                       
                       jhj.xoM :k    aCu F: w MpmqMvMMI,I                       
                      bzMhz:W    .Mw . o lYh ai M iMa pM.j                      
                     hzqWWM;    M;o.WMWWMkMX f.a aa bModpo.                     
                     ;tMbbv   xp oJMMWWWWMMMM iv  dLMXakM:T                     
                       mdh        MMWWWWWWWbQLCzurjktvMor                       
                      ,QFw ;M,b .MWWWWWWWMWMWd  xz   M,kd X                     
                      qjMIo IMTW.WWWWWMWWWM.o.I   rpULaMdi.                     
                       .mMM  uoWWWMWWWWWWp qM,,M l M;mMbrI                      
                        f nm  MMW MWWjMuMj  I  o   LbMac                        
                              WWdMWWWW Mv a.b..aauMhMwQf                        
                              MoWWW,WWtjonJMWtoMdoaoMI                          
                              MMMM Mi    xd:Mm tMwo Cr,                         
                             xMMc .otqokWMMMao:oio.                             
                             MW    .   C..MkTIo                                 
                            WW                                                  
                           QWM                                                  
                           WW                                                   
                          uMW                                                   
                          WW                                                    
                          MW

The High-Density, Token-Efficient Data Protocol for Large Language Models.

Arkadia Data Format (AKD) is a schema-first protocol designed specifically to optimize communication with LLMs. By stripping away redundant syntax (like repeated JSON keys) and enforcing strict typing, AKD offers up to 30% token savings, faster parsing, and a metadata layer invisible to your application logic but fully accessible to AI models.

This Python package includes the full core library and the akd CLI tool.


โœจ Key Features

  • ๐Ÿ“‰ Token Efficiency: Reduces context window usage by replacing verbose JSON objects with dense Positional Records (Tuples).
  • ๐Ÿ›ก๏ธ Type Safety: Enforces types (int, float, bool, string) explicitly in the schema before data reaches the LLM.
  • ๐Ÿง  Metadata Injection: Use #tags and $attributes to pass context (e.g., source confidence, deprecation warnings) to the LLM without polluting your data structure.
  • ๐Ÿ–ฅ๏ธ Powerful CLI: Includes the akd terminal tool for encoding, decoding, and benchmarking files or streams.
  • โšก Zero Dependencies: Pure Python implementation, lightweight and fast.

๐Ÿ“ฆ Installation

Install directly from PyPI:

pip install arkadia-data

๐Ÿš€ Quick Start (Library)

Basic Usage

import arkadia.data as akd

# 1. Encode: Python Dict -> AKD String
data = { "id": 1, "name": "Alice", "active": True }
encoded = akd.encode(data)

print(encoded)
# Output: <id:number,name:string,active:bool>(1,"Alice",true)


# 2. Decode: AKD String -> Python Dict
input_str = '<score:number>(98.5)'
result = akd.decode(input_str)

if not result.errors:
    print(result.node.value) # 98.5
else:
    print("Errors:", result.errors)

๐Ÿ›  CLI Usage

The Python package installs the akd (alias: ak-data) command globally.

USAGE:
   akd / ak-data <command> [flags]

COMMANDS:
   enc             [ENCODE] Convert JSON/YAML to AK Data format
   dec             [DECODE] Parse AK Data format back to JSON
   benchmark       [BENCHMARK] Run performance and token usage tests

Examples

1. Pipe JSON to AKD (Compact Mode):

echo '{ "data": 2}' | akd enc - -c
# Output: <data:number>(2)

2. Decode AKD file to JSON:

akd dec payload.akd -f json

3. Run Benchmarks on a directory:

akd benchmark ./data_samples

โšก Benchmarks

Why switch? Because every token counts. AKCD (Arkadia Compressed Data) consistently outperforms standard formats.

BENCHMARK SUMMARY:

   JSON  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘     6921 tok     0.15 ms
   AKCD  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘     5416 tok     4.40 ms
   AKD   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘     6488 tok     4.29 ms
   TOON  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ     8198 tok     2.36 ms

   FORMAT     TOKENS       VS JSON
   ---------------------------------
   AKCD       5416         -21.7%
   AKD        6488         -6.3%
   JSON       6921         +0.0%
   TOON       8198         +18.5%

CONCLUSION: Switching to AKCD saves 1505 tokens (21.7%) compared to JSON.

๐Ÿ“– Syntax Specification

AKD separates structure (Schema) from content (Data).

1. Primitives

Primitive values are automatically typed. Strings are quoted, numbers and booleans are bare.

Type Input Encoded Output
Integer 123 <number>123
String "hello" <string>"hello"
Boolean true <bool>true
Null null <null>null

2. Schema Definition (@Type)

Define the structure once to avoid repeating keys.

/* Define a User type */
@User <
  id: number, 
  name: string, 
  role: string 
>

3. Data Structures

Positional Records (Tuples)

The most efficient way to represent objects. Values must match the schema order.

/* Schema: <x:number, y:number> */
(10, 20)

Named Records (Objects)

Flexible key-value pairs, similar to JSON, used when schema is loose or data is sparse.

{
  id: 1,
  name: "Admin"
}

Lists

Dense arrays. Can be homogenous (list of strings) or mixed.

[ "active", "pending", "closed" ]

4. Metadata System

AKD allows you to inject metadata that is visible to the LLM but ignored by the parser when decoding back to your application.

Attributes ($key=value) & Tags (#flag)

@Product <
  $version="2.0"
  sku: string,
  
  /* Tagging a field as deprecated */
  #deprecated
  legacy_id: int
>

5. Escaped Identifiers (Backticks)

AK-Data allows the use of spaces, symbols, and special characters in names by wrapping them in backticks (```). This applies to schema names, field keys, and metadata attributes.

@`System User+` <
  // $`last-sync`="2024-05-10" //
  `Full Name`: string,
  `is-active?`: bool,
  $`Special ID*` id: number
>
{
  `Full Name`: "John Doe", 
  `is-active?`: true, 
  id: 101
}

6. Prompt Output Mode (--prompt-output)

This mode is specifically designed for Large Language Models (LLMs). It transforms AK-Data into a Structural Blueprint, providing a perfect template for the AI to follow. Instead of raw data values, it renders a recursive, human-readable schema structure.

Key Features:

  • Full Structural Expansion: Anonymous nested types are fully expanded into braces {}.
  • Semantic Hinting: Field-level comments from the schema are injected directly into the template.
  • Representative Sampling: Lists show a single blueprint element followed by a continuation hint (...), saving tokens while maintaining clarity.

Example Usage:

# Generate a structural template for an LLM
echo '<[ /* id */ id: number, name: string, val: <id: string, num: number> ]>' | akd dec -f akd --prompt-output -

Output:

[
  {
    id: number /* id */,
    name: string,
    val: {
      id: string,
      num: number
    }
  },
  ... /* repeat pattern for additional items */
]

Why use it?

  1. Reduce Hallucination: The LLM sees exactly what types and formats are expected for every field.
  2. Context Efficiency: By showing only one example in a list, you define the logic without wasting the context window on repetitive data.
  3. Implicit Instruction: The transition from positional () to named {} in prompt mode helps the AI differentiate between the "Instructions" and the final "Compact Output".

๐Ÿ“„ License

This project is licensed under the MIT License.

Built by Arkadia Solutions. Engineering the kernel of distributed intelligence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arkadia_data-0.1.11.tar.gz (114.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arkadia_data-0.1.11-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file arkadia_data-0.1.11.tar.gz.

File metadata

  • Download URL: arkadia_data-0.1.11.tar.gz
  • Upload date:
  • Size: 114.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for arkadia_data-0.1.11.tar.gz
Algorithm Hash digest
SHA256 eb34cfd18f5546731c195295fdf2e6fac0f4dcf54f48c9ab39d3b270f946d589
MD5 6752f9137d3a144cca6211b68d10e14e
BLAKE2b-256 bb519c2de0826bd0f5e2eec08c94b0b48395c91c8cc9193bd165f1dee174df5f

See more details on using hashes here.

File details

Details for the file arkadia_data-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: arkadia_data-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for arkadia_data-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 5d5569027cddb95dc67bc6637c8f66dd8209feb75e1aedbb15d7d39472023a7e
MD5 6c43379d2ea5b7ea7c612147b559ce4d
BLAKE2b-256 773de2871c6653deb729d827aefa64da6fece2610b67ccfda81ba4d7e54f14a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page