Skip to main content

High-performance zero-copy tensor protocol

Project description

Tenso Banner

Tenso

Up to 35x faster than Apache Arrow on deserialization. 46x less CPU than SafeTensors.

Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.

PyPI version Python 3.10+ License: Apache 2.0


Why Tenso?

Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.

The Problem

Traditional formats waste CPU cycles during deserialization:

  • SafeTensors: 37.1% CPU usage (great for disk, overkill for network)
  • Pickle: 40.9% CPU usage + security vulnerabilities
  • Arrow: Faster on serialization, but up to 32x slower on deserialization for large tensors

The Solution

Tenso achieves true zero-copy with:

  • Minimalist Header: Fixed 8-byte header eliminates JSON parsing overhead.
  • 64-byte Alignment: SIMD-ready padding ensures the data body is cache-line aligned.
  • Direct Memory Mapping: The CPU points directly to existing buffers without copying.

Result: 0.8% CPU usage vs >40% for SafeTensors/Pickle.


Benchmarks

System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, M4 Pro

1. In-Memory Serialization (LLM Layer - 64MB)

Format Size Serialize Deserialize Speedup (Deser)
Tenso 64.00 MB 3.51 ms 0.004 ms 1x
Arrow 64.00 MB 7.06 ms 0.011 ms 2.8x slower
SafeTensors 64.00 MB 8.14 ms 2.39 ms 597x slower
Pickle 64.00 MB 2.93 ms 2.71 ms 677x slower
MsgPack 64.00 MB 10.44 ms 3.05 ms 763x slower

Note: Tenso (Vect) variant is even faster with 0.000 ms deserialize time.

2. Disk I/O (256 MB Matrix)

Format Write Read
Tenso 29.41 ms 36.28 ms
NumPy .npy 24.83 ms 43.08 ms
Pickle 49.90 ms 24.24 ms

3. Stream Reading (95 MB Packet)

Method Time Throughput Speedup
Tenso read_stream 7.68 ms 12,417 MB/s 1x
Optimised Loop 13.89 ms 7,396 MB/s 1.9x slower

4. CPU Usage (Efficiency)

Format Serialize CPU% Deserialize CPU%
Tenso 117.3% 0.8%
Arrow 57.1% 1.0%
SafeTensors 67.1% 37.1%
Pickle 44.0% 40.9%

5. Arrow vs Tenso (Comparison)

Size Tenso Ser Arrow Ser Tenso Des Arrow Des Speedup
Small 0.130ms 0.056ms 0.009ms 0.035ms 4.1x
Medium 0.972ms 0.912ms 0.020ms 0.040ms 2.0x
Large 3.166ms 3.655ms 0.019ms 0.222ms 11.8x
XLarge 19.086ms 28.726ms 0.023ms 0.733ms 32.0x

6. Network Performance

  • Packet Throughput: 89,183 packets/sec (over localhost TCP)
  • Latency: 11.2 µs/packet
  • Async Write Throughput: 88,397 MB/s (1.4M tensors/sec)

Installation

pip install tenso

Optional extras:

pip install tenso[ray]    # Ray integration
pip install tenso[gpu]    # GPU acceleration (CuPy/PyTorch/JAX)
pip install tenso[grpc]   # gRPC support

Quick Start

Basic Serialization

import numpy as np
import tenso

# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)

# Serialize
packet = tenso.dumps(data)

# Deserialize (Zero-copy view)
restored = tenso.loads(packet)

Async I/O

import asyncio
import tenso

async def handle_client(reader, writer):
    # Asynchronously read a tensor from the stream
    data = await tenso.aread_stream(reader)
    
    # Process and write back
    await tenso.awrite_stream(data * 2, writer)

FastAPI Integration

from fastapi import FastAPI
import numpy as np
from tenso.fastapi import TensoResponse

app = FastAPI()

@app.get("/tensor")
async def get_tensor():
    data = np.ones((1024, 1024), dtype=np.float32)
    return TensoResponse(data) # Zero-copy streaming response

Advanced Features

Ray Integration (Distributed Computing)

Replace pickle-based serialization in Ray with Tenso for 46x less CPU overhead on tensor operations. Works transparently with ray.put(), ray.get(), remote functions, and actors.

import ray
import numpy as np
from tenso.ray import register

ray.init()
register()  # Register Tenso as the serializer for numpy arrays

# All ray.put/get operations now use Tenso
ref = ray.put(np.zeros((1000, 1000)))
arr = ray.get(ref)  # Deserialized via Tenso

# Works transparently with remote functions
@ray.remote
def process(tensor):
    return tensor.mean()

ray.get(process.remote(np.random.randn(1000, 1000)))

Optional support for PyTorch and JAX tensors:

register(include_torch=True, include_jax=True)

Quantized Tensors (4-bit & 8-bit)

Native support for quantized representations to reduce memory footprint with minimal accuracy loss.

from tenso.quantize import QuantizedTensor
import numpy as np

data = np.random.randn(1024, 1024).astype(np.float32)

# Quantize to 8-bit (per-tensor scheme)
qt = QuantizedTensor.quantize(data, dtype="qint8", scheme="per_tensor")
print(qt)  # QuantizedTensor(dtype=qint8, shape=(1024, 1024), ...)

# Serialize/deserialize with Tenso
import tenso
packet = tenso.dumps(qt)
restored = tenso.loads(packet)

# Dequantize back to float32
result = restored.dequantize()

Supported dtypes: qint8, quint8, qint4, quint4 Supported schemes: per_tensor, per_channel, per_group

Inter-Process Communication (Shared Memory)

Transfer tensors between local processes with single-digit microsecond latency using Shared Memory. This avoids socket overhead entirely by passing memory handles.

from tenso import TensoShm
import numpy as np

# Process A: Write to Shared Memory
data = np.random.randn(1024, 1024).astype(np.float32)
# Automatically sizes and creates the SHM segment
with TensoShm.create_from("shared_tensor_01", data) as shm:
    print("Tensor is in SHM. Waiting for reader...")
    input() # Keep process alive

# Process B: Read from Shared Memory (Zero-Copy)
with TensoShm("shared_tensor_01") as shm:
    # Instant view of the data without copying
    array = shm.get()
    print(f"Received: {array.shape}")

GPU Acceleration (Direct Transfer)

Supports fast transfers between Tenso streams and device memory for CuPy, PyTorch, and JAX using pinned host memory.

import tenso.gpu as tgpu

# Read directly from a stream into a GPU tensor
torch_tensor = tgpu.read_to_device(stream, device_id=0)

bfloat16 Support

Native support for bfloat16 dtype, commonly used in ML training. Works with NumPy 2.1+ natively or falls back to ml_dtypes.

import numpy as np
import tenso

# Serialize bfloat16 tensors directly
data = np.ones((512, 512), dtype=np.float32)  # or bfloat16 if available
packet = tenso.dumps(data)

Sparse Formats & Bundling

Tenso natively supports complex data structures beyond simple dense arrays:

  • Sparse Matrices: Direct serialization for COO, CSR, and CSC formats.
  • Dictionary Bundling: Pack multiple tensors into a single nested dictionary packet.
  • LZ4 Compression: Optional high-speed compression for sparse or redundant data.

Data Integrity (XXH3)

Protect your tensors against network corruption with ultra-fast 64-bit checksums:

# Serialize with 64-bit checksum footer
packet = tenso.dumps(data, check_integrity=True)

# Verification is automatic during loads()
restored = tenso.loads(packet)

gRPC Integration

Tenso provides built-in support for gRPC, allowing you to pass tensors between services with minimal overhead.

from tenso.grpc import tenso_msg_pb2, tenso_msg_pb2_grpc
import tenso

# In your Servicer
def Predict(self, request, context):
    data = tenso.loads(request.tensor_packet)
    result = data * 2
    return tenso_msg_pb2.PredictResponse(
        result_packet=bytes(tenso.dumps(result))
    )

Protocol Design

Tenso uses a minimalist structure designed for direct memory access:

┌─────────────┬──────────────┬──────────────┬────────────────────────┬──────────────┐
│   HEADER    │    SHAPE     │   PADDING    │    BODY (Raw Data)     │    FOOTER    │
│   8 bytes   │  Variable    │   0-63 bytes │   C-Contiguous Array   │   8 bytes*   │
└─────────────┴──────────────┴──────────────┴────────────────────────┴──────────────┘
                                                                        (*Optional)

The padding ensures the body starts at a 64-byte boundary, enabling AVX-512 vectorization and zero-copy memory mapping.


Use Cases

  • Model Serving APIs: Up to 35x faster deserialization with 46x less CPU saves massive overhead on inference nodes.
  • Distributed Training: Efficiently pass gradients or activations between nodes with native Ray integration.
  • GPU-Direct Pipelines: Stream data from network cards to GPU memory with minimal host intervention.
  • Real-time Robotics: 10.2 µs latency for high-frequency sensor fusion (LIDAR, Radar).
  • High-Throughput Streaming: 89K packets/sec network transmission for real-time data pipelines.

Contributing

Contributions are welcome! We are currently looking for help with:

  • C++ / JavaScript Clients: Extending the protocol to other ecosystems.

License

Apache License 2.0 - see LICENSE file.

Citation

@software{tenso2025,
  author = {Khushiyant},
  title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
  year = {2025},
  url = {https://github.com/Khushiyant/tenso}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tenso-0.19.2.tar.gz (92.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tenso-0.19.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.2 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.2-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.2-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (407.2 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.2-cp311-abi3-win_amd64.whl (246.1 kB view details)

Uploaded CPython 3.11+Windows x86-64

tenso-0.19.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (414.1 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

tenso-0.19.2-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (404.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

tenso-0.19.2-cp311-abi3-macosx_11_0_arm64.whl (360.9 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file tenso-0.19.2.tar.gz.

File metadata

  • Download URL: tenso-0.19.2.tar.gz
  • Upload date:
  • Size: 92.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.19.2.tar.gz
Algorithm Hash digest
SHA256 fde2d56f054bc9a27b428690fd91d001d3943a2b86e9bcbeeedbae8181b27237
MD5 524dd7ce1f036c70cb205ce8183b88da
BLAKE2b-256 39a32660abe353d002f50e98ccec53f51c9f8b4312fd1900fa183274c7b912d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2.tar.gz:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4716d010fac3b905cdc80c2543ca1dbe45f81310b306b36df5e165b5ef9d61d3
MD5 b9f87eb4aba585d7798caef3ebc68842
BLAKE2b-256 8e499c96b71a25ba6aaffde12a701abd97edf8d1425a75cef9ff584d79aded3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7c0d4fa633d4e94536ff4c76755514bd3a37167ab59e94bf0e655c0be532c3e4
MD5 cbb01f727fe9035d800d189f7af2d188
BLAKE2b-256 7fdea014669100cb975721aea2c9e5fcee978036d5da1b0b2c5fe34e1c73837a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.2-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ac5ed8ad2a083b5d834cda61d7ea06e848fb4e010bf2bd2570f29a5e651918a3
MD5 8fb2da970067ee020f1a94ed54051a4d
BLAKE2b-256 06f0f54e317f65886e73ee4faa7570dbb524411f9cfe4a8c2705f7183bfbfdc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.2-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6b5577abc1be5856bcbfaa90e79a2032d8286edb79c9e333da1ac2a751a1bda1
MD5 87f718920f84e8f9dd31b92a86d63642
BLAKE2b-256 e5a478a8b1b3e1dc4994b89e8ab248bb1c61797621b94ee33c9797c1eb6c9bf1

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: tenso-0.19.2-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 246.1 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.19.2-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 23154c5d64ecec0c1d69149b52b843f183599f44b29edebc1dc7dd55a5efa311
MD5 cb0bdabfb4e020765f36ded557908f21
BLAKE2b-256 3a725d9eb8937b29a71357c1415fc05ac92a56687103d8f5de1d209276324e29

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-cp311-abi3-win_amd64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tenso-0.19.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b5510525baca248b0ecaae38296ab60abdf57447510585b5daf39f984f531ceb
MD5 80a69f524593babfe47e8999a03f8a9a
BLAKE2b-256 a2d2f6623d97580ea6f3ebaa7b9bc0c4d6ebb136e6a375f0c8f779c337753308

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.2-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2d1849af74fc3a5f089f51a75a7d76fc34251f63a8c28b1ad65dd9c0aa063497
MD5 a812a7681917532017d300ce7797c508
BLAKE2b-256 a9337ee7452e9f4a98c24c5f3ea78a5f3d5dcf696e5343e5772c340b4ba33daa

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.2-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tenso-0.19.2-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 40cdfc604835ee0f42cc34c80cf8cb71cffd88d8f2662ae3572da797148a9550
MD5 5e4e09ec785c15d8db2b961cfda09af9
BLAKE2b-256 2833b86c7aab88ad59cfce904456a1eadeb4665fe49c9d6c35a73cd12d465654

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.2-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page