Skip to main content

High-performance ASUN (Array-Schema Unified Notation) Python extension.

Project description

asun-py

C++ pybind11 extension for ASUN (Array-Schema Unified Notation).

Provides 7 functions without requiring manual schema strings for encoding: encode, encodeTyped, encodePretty, encodePrettyTyped, decode, encodeBinary, decodeBinary.

The wheel also ships asun.pyi and py.typed, so editors and static type checkers can understand the extension module without a separate stub package.

中文文档


Requirements

Tool Version
g++ ≥ 11 (C++17)
python3-dev any (provides Python.h)
Python ≥ 3.8

pybind11 2.13.6 headers are vendored in vendor/pybind11/ — no separate installation needed.


Build

# Option A — shell script (auto-installs python3-dev via sudo if missing)
bash build.sh

# Option B — Makefile
make

# Option C — CMake
cmake -B build && cmake --build build

API

Type inference rules

Python value Inferred ASUN type
bool bool
int int
float float
str str
None optional (e.g. str?, int?)

Cross-row type merging for lists: When encoding a list, all rows are scanned to compute the final type:

  • A field that is non-None in row 0 but None in some later row is promoted to optional (e.g. strstr?, intint?).
  • Type conflicts between non-None values (e.g. int in row 0, str in row 1) fall back to str.

This means encodeTyped is safe to use even when only some rows have None for a given field.

encode(obj) -> str — schema without scalar hints, inferred

asun.encode({"id": 1, "name": "Alice"})
# → '{id,name}:\n(1,Alice)\n'

asun.encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# → '[{id,name}]:\n(1,Alice),\n(2,Bob)\n'

Decode semantics without scalar hints: When decoded with decode(), terminal field values are returned as strings because the schema omits scalar type hints. Structural bindings such as @{} and @[] still remain in the schema. Use encodeTyped when you need a type-preserving round-trip.

encodeTyped(obj) -> str — schema with scalar hints, inferred

Type is inferred from all rows (not just the first). A field that is None in any row is made optional:

asun.encodeTyped({"id": 1, "name": "Alice", "active": True})
# → '{id@int,name@str,active@bool}:\n(1,Alice,true)\n'

# Optional field inferred from cross-row merging:
asun.encodeTyped([{"id": 1, "tag": "hello"}, {"id": 2, "tag": None}])
# → '[{id@int,tag@str?}]:\n(1,hello),\n(2,)\n'

encodePretty(obj) -> str — pretty + untyped, inferred

pretty = asun.encodePretty(rows)

encodePrettyTyped(obj) -> str — pretty + scalar hints, inferred

pretty = asun.encodePrettyTyped(rows)

decode(text) -> dict | list[dict]

Decodes both schemas with scalar hints and schemas without scalar hints embedded in the text:

# schema with scalar hints → values restored as Python types
rec  = asun.decode('{id@int, name@str}:\n(1,Alice)\n')    # {'id': 1, 'name': 'Alice'}
rows = asun.decode('[{id@int, name@str}]:\n(1,Alice),\n(2,Bob)\n')

# schema without scalar hints → scalar values returned as strings
rec2 = asun.decode('{id,name}:\n(1,Alice)\n')             # {'id': '1', 'name': 'Alice'}

Block comments are supported anywhere whitespace is allowed:

rec = asun.decode('/* top */ {id@int,name@str}: /* row */ (1, /* name */ Alice)')

encodeBinary(obj) -> bytes — schema inferred internally

data = asun.encodeBinary(rows)

decodeBinary(data, schema) -> dict | list[dict]

Schema is required because the binary wire format carries no embedded type information:

rows = asun.decodeBinary(data, "[{id@int, name@str}]")

Typing

asun-py includes inline typing support for the compiled extension:

from asun import decode

rows = decode("[{id@int, name@str}]:(1,Alice),(2,Bob)")

Type checkers will infer dict[str, Any] | list[dict[str, Any]] for decode results and validate function signatures from the bundled asun.pyi.


Binary format

Little-endian layout, identical to asun-rs and asun-go:

Type Bytes
int 8 (i64 LE)
uint 8 (u64 LE)
float 8 (f64 LE)
bool 1
str 4-byte length LE + UTF-8 bytes
optional 1-byte tag (0=null, 1=present) + value
slice 4-byte count LE + elements

Run tests

# after building:
python3 -m pytest tests/ -v

Example

import asun

users = [
    {"id": 1, "name": "Alice", "score": 9.5},
    {"id": 2, "name": "Bob",   "score": 7.2},
]

# Schema is inferred automatically—no schema string needed
text        = asun.encode(users)            # schema binding without scalar hints
textTyped   = asun.encodeTyped(users)       # schema binding with scalar hints
pretty      = asun.encodePrettyTyped(users) # pretty + scalar hints
blob        = asun.encodeBinary(users)     # binary (schema inferred internally)

assert asun.decode(textTyped)  == users    # round-trip with scalar hints
assert asun.decode(pretty)     == users
assert asun.decodeBinary(blob, "[{id@int, name@str, score@float}]") == users

Latest Benchmarks

Measured on this machine with:

bash build.sh
PYTHONPATH=. python3 examples/bench.py

Headline numbers:

  • Flat 1,000-record dataset: ASUN text serialize 118.98ms vs JSON 403.32ms, deserialize 221.21ms vs JSON 441.89ms
  • Flat 10,000-record dataset: ASUN text serialize 81.70ms vs JSON 293.38ms, deserialize 158.39ms vs JSON 317.44ms
  • Size summary for 1,000 flat records: JSON 137,674 B, ASUN text 57,761 B (58% smaller), ASUN binary 74,454 B (46% smaller vs JSON)
  • Throughput summary on 1,000 records: ASUN text was 3.58x faster than JSON for serialize and 2.01x faster for deserialize
  • Binary mode was even faster: 7.18x faster than JSON on serialization and 4.16x faster on deserialization in the benchmark summary

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asun-1.0.1.tar.gz (201.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

asun-1.0.1-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (194.8 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

asun-1.0.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (194.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

asun-1.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (194.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

asun-1.0.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (194.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

asun-1.0.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (193.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

asun-1.0.1-cp39-cp39-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (194.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

asun-1.0.1-cp38-cp38-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (193.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file asun-1.0.1.tar.gz.

File metadata

  • Download URL: asun-1.0.1.tar.gz
  • Upload date:
  • Size: 201.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for asun-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8b290df9375b4c858c9ee0b62ea1d934458639b1f517799cc2b3f9e65fe9a565
MD5 1b10f23bbce0f7b7e2fb3dbc9effacc1
BLAKE2b-256 a5ed9c1cd566655384d6363436caae87b1d28e9ac34c1cc59a3a6fdf1a9b3cc7

See more details on using hashes here.

File details

Details for the file asun-1.0.1-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for asun-1.0.1-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 35a7036812bced7e325b5db1bbbfd6fc7cd32cb3e40496a33a892285c1095185
MD5 9a022aab8e28a6c11cc9fe9a95fa29dd
BLAKE2b-256 ebbcdc1ae0093778d924e8f04af3e076765673eb35768436021d09b3232965df

See more details on using hashes here.

File details

Details for the file asun-1.0.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for asun-1.0.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8ae037a94a668b0ac21797bd40d3c87c5b5c0250036f9be0d8564f047fdbda10
MD5 8a17296ef16784f7a19a5e57bff165cc
BLAKE2b-256 caea471eb44b9d7260b012d264892fbb2883edfaf40961267b798c7926773305

See more details on using hashes here.

File details

Details for the file asun-1.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for asun-1.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8d8e333a8cb31f108f33325657c003232d0c901890c0bcadeba571deddd2094f
MD5 ab3add9d1714dc67558a483587cd558c
BLAKE2b-256 14b6c397a7a52ffe28c0c07323f76dfe79cff61c4aabd66a561e6f1753c7a6a3

See more details on using hashes here.

File details

Details for the file asun-1.0.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for asun-1.0.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f4eb21bd8367a9f3d6c15d4d4484835d46f4358af2539e326c47e0e3c17b1c61
MD5 f0cabf7ce5451aab88be86c0210d62dd
BLAKE2b-256 7a06fa0fa622b8cd50d43d982a1d08553ca86494a58507f793da304d299e8f89

See more details on using hashes here.

File details

Details for the file asun-1.0.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for asun-1.0.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9297b01a98b6e207ec26af34999db69443458c669192636d83561b290f73721d
MD5 6a5eb64a53b6ff59f59c0cf938431382
BLAKE2b-256 c6af5b98b5a7fafe8bb5d295e369c014e3682b58fe557e0dfb7f5e351d626275

See more details on using hashes here.

File details

Details for the file asun-1.0.1-cp39-cp39-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for asun-1.0.1-cp39-cp39-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 57363c10d589eb99ec834d8ab39aed9fd745faf466da3114192b5efed9dad1d2
MD5 15195462de15349aa8ae559bce0c8f1d
BLAKE2b-256 d7b934acdeab45b07866964055698a1427e98b761bb0a416d7691edcb54d4c44

See more details on using hashes here.

File details

Details for the file asun-1.0.1-cp38-cp38-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for asun-1.0.1-cp38-cp38-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 386745d9d46ad6e6d8c4501e11b2d053b1477efc9dbe1c0b8134ebb449325abc
MD5 dc33c6b4fc5a44d6879ad0764360f245
BLAKE2b-256 e2a3a5e6a3b9c18f1a6adfc081b300724dc713d6d4fe77de20fd8ecf4e1c1ded

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page