eXPeditious Data Transfer
Project description
xpdt: eXPeditious Data Transfer
About
xpdt is (yet another) language for defining data-types and generating code for serializing and deserializing them. It aims to produce code with little or no overhead and is based on fixed-length representations which allows for zero-copy deserialization and (at-most-)one-copy writes (source to buffer).
The generated C code, in particular, is highly optimized and often permits the elimination of data-copying for writes and enables optimizations such as loop-unrolling for fixed-length objects. This can lead to read speeds in excess of 500 million objects per second (~1.8 nsec per object).
Examples
The xpdt source language looks similar to C struct definitions:
struct timestamp {
u32 tv_sec;
u32 tv_nsec;
};
struct point {
i32 x;
i32 y;
i32 z;
};
struct line {
timestamp time;
point line_start;
point line_end;
bytes comment;
};
Fixed width integer types from 8 to 128 bit are supported, along with the
bytes
type, which is a variable-length sequence of bytes.
Target Languages
The following target languages are currently supported:
- C
- Python
The C code is very highly optimized.
The Python code is about as well optimized for CPython as I can make it. It
uses typed NamedTuple
for objects, which has some small overhead over regular
tuples, and it uses struct.Struct
to do the packing/unpacking. I have also
code-golfed the generated bytecodes down to what I think is minimal given the
design constraints. As a result, performance of the pure Python code is
comparable to a JSON library implemented in C or Rust.
For better performance in Python, it may be desirable to develop a Cython target. In some instances CFFI structs may be more performant since they can avoid the creation/destruction of an object for each record.
Target languages are implemented purely as jinja2
templates.
Serialization format
The serialization format for fixed-length objects is simply a packed C struct.
For any object which contains bytes
type fields:
- a 32bit unsigned record length is prepended to the struct
- all
bytes
type fields are converted tou32
and contain the length of the bytes - all bytes contents are appended after the struct in the order in which they appear
For example, following the example above, the serialization would be:
u32 tot_len # = 41
u32 time.tv_sec
u32 time.tv_usec
i32 line_start.x
i32 line_start.y
i32 line_start.z
i32 line_end.x
i32 line_end.y
i32 line_end.z
u32 comment # = 5
u8 'H'
u8 'e'
u8 'l'
u8 'l'
u8 'o'
Features
The feature-set is, as of now, pretty slim.
There are no array / sequence / map types, and no keyed unions.
Support for such things may be added in future provided that suitable implementations exist. An implementation is suitable if:
- It admits a zero (or close to zero) overhead implementation
- it causes no overhead when the feature isn't being used
License
The compiler is released under the GPLv3.
The C support code/headers are released under the MIT license.
The generated code is yours.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file xpdt-0.3.1.tar.gz
.
File metadata
- Download URL: xpdt-0.3.1.tar.gz
- Upload date:
- Size: 37.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92a9b471f93b17795124bba397434cf7d3f072b8797bc9c8409c4171982ed9fc |
|
MD5 | af8cfae8a715ab031bf05bd705fc9519 |
|
BLAKE2b-256 | aee8567cc24fceff021b88f0028893d7dcae66cb04308127b6363f102294df82 |