Skip to main content

High-performance zero-copy tensor protocol

Project description

Tenso Banner

Tenso

Up to 35x faster than Apache Arrow on deserialization. 46x less CPU than SafeTensors.

Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.

PyPI version Python 3.10+ License: Apache 2.0


Why Tenso?

Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.

The Problem

Traditional formats waste CPU cycles during deserialization:

  • SafeTensors: 37.1% CPU usage (great for disk, overkill for network)
  • Pickle: 40.9% CPU usage + security vulnerabilities
  • Arrow: Faster on serialization, but up to 32x slower on deserialization for large tensors

The Solution

Tenso achieves true zero-copy with:

  • Minimalist Header: Fixed 8-byte header eliminates JSON parsing overhead.
  • 64-byte Alignment: SIMD-ready padding ensures the data body is cache-line aligned.
  • Direct Memory Mapping: The CPU points directly to existing buffers without copying.

Result: 0.8% CPU usage vs >40% for SafeTensors/Pickle.


Benchmarks

System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, M4 Pro

1. In-Memory Serialization (LLM Layer - 64MB)

Format Size Serialize Deserialize Speedup (Deser)
Tenso 64.00 MB 3.51 ms 0.004 ms 1x
Arrow 64.00 MB 7.06 ms 0.011 ms 2.8x slower
SafeTensors 64.00 MB 8.14 ms 2.39 ms 597x slower
Pickle 64.00 MB 2.93 ms 2.71 ms 677x slower
MsgPack 64.00 MB 10.44 ms 3.05 ms 763x slower

Note: Tenso (Vect) variant is even faster with 0.000 ms deserialize time.

2. Disk I/O (256 MB Matrix)

Format Write Read
Tenso 29.41 ms 36.28 ms
NumPy .npy 24.83 ms 43.08 ms
Pickle 49.90 ms 24.24 ms

3. Stream Reading (95 MB Packet)

Method Time Throughput Speedup
Tenso read_stream 7.68 ms 12,417 MB/s 1x
Optimised Loop 13.89 ms 7,396 MB/s 1.9x slower

4. CPU Usage (Efficiency)

Format Serialize CPU% Deserialize CPU%
Tenso 117.3% 0.8%
Arrow 57.1% 1.0%
SafeTensors 67.1% 37.1%
Pickle 44.0% 40.9%

5. Arrow vs Tenso (Comparison)

Size Tenso Ser Arrow Ser Tenso Des Arrow Des Speedup
Small 0.130ms 0.056ms 0.009ms 0.035ms 4.1x
Medium 0.972ms 0.912ms 0.020ms 0.040ms 2.0x
Large 3.166ms 3.655ms 0.019ms 0.222ms 11.8x
XLarge 19.086ms 28.726ms 0.023ms 0.733ms 32.0x

6. Network Performance

  • Packet Throughput: 89,183 packets/sec (over localhost TCP)
  • Latency: 11.2 µs/packet
  • Async Write Throughput: 88,397 MB/s (1.4M tensors/sec)

Installation

pip install tenso

Optional extras:

pip install tenso[api]    # gRPC, FastAPI, Ray integration
pip install tenso[gpu]    # GPU acceleration (CuPy/PyTorch/JAX)

Quick Start

Basic Serialization

import numpy as np
import tenso

# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)

# Serialize
packet = tenso.dumps(data)

# Deserialize (Zero-copy view)
restored = tenso.loads(packet)

Async I/O

import asyncio
import tenso

async def handle_client(reader, writer):
    # Asynchronously read a tensor from the stream
    data = await tenso.aread_stream(reader)
    
    # Process and write back
    await tenso.awrite_stream(data * 2, writer)

FastAPI Integration

from fastapi import FastAPI
import numpy as np
from tenso.fastapi import TensoResponse

app = FastAPI()

@app.get("/tensor")
async def get_tensor():
    data = np.ones((1024, 1024), dtype=np.float32)
    return TensoResponse(data) # Zero-copy streaming response

Advanced Features

Ray Integration (Distributed Computing)

Replace pickle-based serialization in Ray with Tenso for 46x less CPU overhead on tensor operations. Works transparently with ray.put(), ray.get(), remote functions, and actors.

import ray
import numpy as np
from tenso.ray import register

ray.init()
register()  # Register Tenso as the serializer for numpy arrays

# All ray.put/get operations now use Tenso
ref = ray.put(np.zeros((1000, 1000)))
arr = ray.get(ref)  # Deserialized via Tenso

# Works transparently with remote functions
@ray.remote
def process(tensor):
    return tensor.mean()

ray.get(process.remote(np.random.randn(1000, 1000)))

Optional support for PyTorch and JAX tensors:

register(include_torch=True, include_jax=True)

Quantized Tensors (4-bit & 8-bit)

Native support for quantized representations to reduce memory footprint with minimal accuracy loss.

from tenso.quantize import QuantizedTensor
import numpy as np

data = np.random.randn(1024, 1024).astype(np.float32)

# Quantize to 8-bit (per-tensor scheme)
qt = QuantizedTensor.quantize(data, dtype="qint8", scheme="per_tensor")
print(qt)  # QuantizedTensor(dtype=qint8, shape=(1024, 1024), ...)

# Serialize/deserialize with Tenso
import tenso
packet = tenso.dumps(qt)
restored = tenso.loads(packet)

# Dequantize back to float32
result = restored.dequantize()

Supported dtypes: qint8, quint8, qint4, quint4 Supported schemes: per_tensor, per_channel, per_group

Inter-Process Communication (Shared Memory)

Transfer tensors between local processes with single-digit microsecond latency using Shared Memory. This avoids socket overhead entirely by passing memory handles.

from tenso import TensoShm
import numpy as np

# Process A: Write to Shared Memory
data = np.random.randn(1024, 1024).astype(np.float32)
# Automatically sizes and creates the SHM segment
with TensoShm.create_from("shared_tensor_01", data) as shm:
    print("Tensor is in SHM. Waiting for reader...")
    input() # Keep process alive

# Process B: Read from Shared Memory (Zero-Copy)
with TensoShm("shared_tensor_01") as shm:
    # Instant view of the data without copying
    array = shm.get()
    print(f"Received: {array.shape}")

GPU Acceleration (Direct Transfer)

Supports fast transfers between Tenso streams and device memory for CuPy, PyTorch, and JAX using pinned host memory.

import tenso.gpu as tgpu

# Read directly from a stream into a GPU tensor
torch_tensor = tgpu.read_to_device(stream, device_id=0)

bfloat16 Support

Native support for bfloat16 dtype, commonly used in ML training. Works with NumPy 2.1+ natively or falls back to ml_dtypes.

import numpy as np
import tenso

# Serialize bfloat16 tensors directly
data = np.ones((512, 512), dtype=np.float32)  # or bfloat16 if available
packet = tenso.dumps(data)

Sparse Formats & Bundling

Tenso natively supports complex data structures beyond simple dense arrays:

  • Sparse Matrices: Direct serialization for COO, CSR, and CSC formats.
  • Dictionary Bundling: Pack multiple tensors into a single nested dictionary packet.
  • LZ4 Compression: Optional high-speed compression for sparse or redundant data.

Data Integrity (XXH3)

Protect your tensors against network corruption with ultra-fast 64-bit checksums:

# Serialize with 64-bit checksum footer
packet = tenso.dumps(data, check_integrity=True)

# Verification is automatic during loads()
restored = tenso.loads(packet)

gRPC Integration

Tenso provides built-in support for gRPC, allowing you to pass tensors between services with minimal overhead.

from tenso.grpc import tenso_msg_pb2, tenso_msg_pb2_grpc
import tenso

# In your Servicer
def Predict(self, request, context):
    data = tenso.loads(request.tensor_packet)
    result = data * 2
    return tenso_msg_pb2.PredictResponse(
        result_packet=bytes(tenso.dumps(result))
    )

Protocol Design

Tenso uses a minimalist structure designed for direct memory access:

┌─────────────┬──────────────┬──────────────┬────────────────────────┬──────────────┐
│   HEADER    │    SHAPE     │   PADDING    │    BODY (Raw Data)     │    FOOTER    │
│   8 bytes   │  Variable    │   0-63 bytes │   C-Contiguous Array   │   8 bytes*   │
└─────────────┴──────────────┴──────────────┴────────────────────────┴──────────────┘
                                                                        (*Optional)

The padding ensures the body starts at a 64-byte boundary, enabling AVX-512 vectorization and zero-copy memory mapping.


Use Cases

  • Model Serving APIs: Up to 35x faster deserialization with 46x less CPU saves massive overhead on inference nodes.
  • Distributed Training: Efficiently pass gradients or activations between nodes with native Ray integration.
  • GPU-Direct Pipelines: Stream data from network cards to GPU memory with minimal host intervention.
  • Real-time Robotics: 10.2 µs latency for high-frequency sensor fusion (LIDAR, Radar).
  • High-Throughput Streaming: 89K packets/sec network transmission for real-time data pipelines.

Contributing

Contributions are welcome! We are currently looking for help with:

  • C++ / JavaScript Clients: Extending the protocol to other ecosystems.

License

Apache License 2.0 - see LICENSE file.

Citation

@software{tenso2025,
  author = {Khushiyant},
  title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
  year = {2025},
  url = {https://github.com/Khushiyant/tenso}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tenso-0.19.4.tar.gz (92.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tenso-0.19.4-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.2 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.6 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (407.2 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.4-cp311-abi3-win_amd64.whl (246.1 kB view details)

Uploaded CPython 3.11+Windows x86-64

tenso-0.19.4-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (413.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

tenso-0.19.4-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (404.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

tenso-0.19.4-cp311-abi3-macosx_11_0_arm64.whl (360.9 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file tenso-0.19.4.tar.gz.

File metadata

  • Download URL: tenso-0.19.4.tar.gz
  • Upload date:
  • Size: 92.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.19.4.tar.gz
Algorithm Hash digest
SHA256 7da115b7c20dd4e226a2f00511869b88601f21c00e2aaf2244e5115fd71ef935
MD5 1fc8b461253b9d35785936bb79014067
BLAKE2b-256 acf7f9c579cf1b1eb88a8a8e4fbdb7861c2bed5a96f57d47eab7c5d46a443b7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4.tar.gz:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.4-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ab5bda2cac1b20870b21d6e2b36c70e30ef0ad8baa63e9c9714d96fa276a00e2
MD5 f56f80c12cf5e3bd02037dc578a32959
BLAKE2b-256 1aa6557d59e1f4e0a4a67c29cad35a723e47e79e948c87c25b3a328072370fb8

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e4a56d345a3d481717d6045d6cbd821c9e87a7a74d42f068090dd5e900ded96b
MD5 a7e2244578e8eec18c5184b7091385a3
BLAKE2b-256 0279fd4361604e244c7d00425b56712493ab6ea30407c8ddbe1c1f677c6738cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3191bedb73890477475260893a38c6ea99609fb048cde6722d6b9ed3e6f639be
MD5 8f196a4a82b095f1aa6d85e913a17be8
BLAKE2b-256 d25dbd26ae4feb36879873d03800a99c7db764e7def7c2c98532c8a19258cdb2

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d59a0d4b2f898da52fc51fb3104da699d0a2e9b13680c746923b7cf3226f0c01
MD5 97855da9007d2c574fa335c788e9df9f
BLAKE2b-256 d90e0b8eeb7a574db486a51544324b6c5329194c074ce77716d50a9c6e6f159e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: tenso-0.19.4-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 246.1 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.19.4-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2e6462d5db231e2d1b1b726ee463115f4003ec19415b61abdd4969bc789bde3b
MD5 b9a2023646df486020bd5b930fb37634
BLAKE2b-256 a527043e370eaad0e86c932e602f14063376ece604eeb0028dee61dffe38aa32

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-cp311-abi3-win_amd64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tenso-0.19.4-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3bbcb6b5a3e2b723da0679ed547c56b99c186daf3815c2ae8dcafac48b3073df
MD5 e365b1e8e2dacb1df1d838a123b12133
BLAKE2b-256 c63cd27d08b90721b1664a6d582dd4715e7c11e34d8c251dca2521c10c6dc647

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.4-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ccce786159e827b6af06677e6ac6bddaf2d6b841eba0f89c33816b3db35b3bbb
MD5 0e7ceaf611b9960994667ca780d69d80
BLAKE2b-256 82bb58790dd539f1ba305d3a33a040aab4af5c905e9237d2e286152d62eafb2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.4-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tenso-0.19.4-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 03513e752deb986043e0def53bc1e0dc558ade5671d0cdb8bd68aa6143856ed0
MD5 bb188061bf55e8ee96cf36187d683f89
BLAKE2b-256 5e945c7dc4b6f787674d6a213c6970852bbe8da6bf67782312c1a5118478ac3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.4-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page