Skip to main content

High-performance zero-copy tensor protocol

Project description

Tenso Banner

Tenso

Up to 35x faster than Apache Arrow on deserialization. 46x less CPU than SafeTensors.

Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.

PyPI version Python 3.10+ License: Apache 2.0


Why Tenso?

Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.

The Problem

Traditional formats waste CPU cycles during deserialization:

  • SafeTensors: 37.1% CPU usage (great for disk, overkill for network)
  • Pickle: 40.9% CPU usage + security vulnerabilities
  • Arrow: Faster on serialization, but up to 32x slower on deserialization for large tensors

The Solution

Tenso achieves true zero-copy with:

  • Minimalist Header: Fixed 8-byte header eliminates JSON parsing overhead.
  • 64-byte Alignment: SIMD-ready padding ensures the data body is cache-line aligned.
  • Direct Memory Mapping: The CPU points directly to existing buffers without copying.

Result: 0.8% CPU usage vs >40% for SafeTensors/Pickle.


Benchmarks

System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, M4 Pro

1. In-Memory Serialization (LLM Layer - 64MB)

Format Size Serialize Deserialize Speedup (Deser)
Tenso 64.00 MB 3.51 ms 0.004 ms 1x
Arrow 64.00 MB 7.06 ms 0.011 ms 2.8x slower
SafeTensors 64.00 MB 8.14 ms 2.39 ms 597x slower
Pickle 64.00 MB 2.93 ms 2.71 ms 677x slower
MsgPack 64.00 MB 10.44 ms 3.05 ms 763x slower

Note: Tenso (Vect) variant is even faster with 0.000 ms deserialize time.

2. Disk I/O (256 MB Matrix)

Format Write Read
Tenso 29.41 ms 36.28 ms
NumPy .npy 24.83 ms 43.08 ms
Pickle 49.90 ms 24.24 ms

3. Stream Reading (95 MB Packet)

Method Time Throughput Speedup
Tenso read_stream 7.68 ms 12,417 MB/s 1x
Optimised Loop 13.89 ms 7,396 MB/s 1.9x slower

4. CPU Usage (Efficiency)

Format Serialize CPU% Deserialize CPU%
Tenso 117.3% 0.8%
Arrow 57.1% 1.0%
SafeTensors 67.1% 37.1%
Pickle 44.0% 40.9%

5. Arrow vs Tenso (Comparison)

Size Tenso Ser Arrow Ser Tenso Des Arrow Des Speedup
Small 0.130ms 0.056ms 0.009ms 0.035ms 4.1x
Medium 0.972ms 0.912ms 0.020ms 0.040ms 2.0x
Large 3.166ms 3.655ms 0.019ms 0.222ms 11.8x
XLarge 19.086ms 28.726ms 0.023ms 0.733ms 32.0x

6. Network Performance

  • Packet Throughput: 89,183 packets/sec (over localhost TCP)
  • Latency: 11.2 µs/packet
  • Async Write Throughput: 88,397 MB/s (1.4M tensors/sec)

Star History

Star History Chart


Installation

pip install tenso

Optional extras:

pip install tenso[api]    # gRPC, FastAPI, Ray integration
pip install tenso[gpu]    # GPU acceleration (CuPy/PyTorch/JAX)

Quick Start

Basic Serialization

import numpy as np
import tenso

# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)

# Serialize
packet = tenso.dumps(data)

# Deserialize (Zero-copy view)
restored = tenso.loads(packet)

Async I/O

import asyncio
import tenso

async def handle_client(reader, writer):
    # Asynchronously read a tensor from the stream
    data = await tenso.aread_stream(reader)
    
    # Process and write back
    await tenso.awrite_stream(data * 2, writer)

FastAPI Integration

from fastapi import FastAPI
import numpy as np
from tenso.fastapi import TensoResponse

app = FastAPI()

@app.get("/tensor")
async def get_tensor():
    data = np.ones((1024, 1024), dtype=np.float32)
    return TensoResponse(data) # Zero-copy streaming response

Advanced Features

Ray Integration (Distributed Computing)

Replace pickle-based serialization in Ray with Tenso for 46x less CPU overhead on tensor operations. Works transparently with ray.put(), ray.get(), remote functions, and actors.

import ray
import numpy as np
from tenso.ray import register

ray.init()
register()  # Register Tenso as the serializer for numpy arrays

# All ray.put/get operations now use Tenso
ref = ray.put(np.zeros((1000, 1000)))
arr = ray.get(ref)  # Deserialized via Tenso

# Works transparently with remote functions
@ray.remote
def process(tensor):
    return tensor.mean()

ray.get(process.remote(np.random.randn(1000, 1000)))

Optional support for PyTorch and JAX tensors:

register(include_torch=True, include_jax=True)

Quantized Tensors (4-bit & 8-bit)

Native support for quantized representations to reduce memory footprint with minimal accuracy loss.

from tenso.quantize import QuantizedTensor
import numpy as np

data = np.random.randn(1024, 1024).astype(np.float32)

# Quantize to 8-bit (per-tensor scheme)
qt = QuantizedTensor.quantize(data, dtype="qint8", scheme="per_tensor")
print(qt)  # QuantizedTensor(dtype=qint8, shape=(1024, 1024), ...)

# Serialize/deserialize with Tenso
import tenso
packet = tenso.dumps(qt)
restored = tenso.loads(packet)

# Dequantize back to float32
result = restored.dequantize()

Supported dtypes: qint8, quint8, qint4, quint4 Supported schemes: per_tensor, per_channel, per_group

Inter-Process Communication (Shared Memory)

Transfer tensors between local processes with single-digit microsecond latency using Shared Memory. This avoids socket overhead entirely by passing memory handles.

from tenso import TensoShm
import numpy as np

# Process A: Write to Shared Memory
data = np.random.randn(1024, 1024).astype(np.float32)
# Automatically sizes and creates the SHM segment
with TensoShm.create_from("shared_tensor_01", data) as shm:
    print("Tensor is in SHM. Waiting for reader...")
    input() # Keep process alive

# Process B: Read from Shared Memory (Zero-Copy)
with TensoShm("shared_tensor_01") as shm:
    # Instant view of the data without copying
    array = shm.get()
    print(f"Received: {array.shape}")

GPU Acceleration (Direct Transfer)

Supports fast transfers between Tenso streams and device memory for CuPy, PyTorch, and JAX using pinned host memory.

import tenso.gpu as tgpu

# Read directly from a stream into a GPU tensor
torch_tensor = tgpu.read_to_device(stream, device_id=0)

bfloat16 Support

Native support for bfloat16 dtype, commonly used in ML training. Works with NumPy 2.1+ natively or falls back to ml_dtypes.

import numpy as np
import tenso

# Serialize bfloat16 tensors directly
data = np.ones((512, 512), dtype=np.float32)  # or bfloat16 if available
packet = tenso.dumps(data)

Sparse Formats & Bundling

Tenso natively supports complex data structures beyond simple dense arrays:

  • Sparse Matrices: Direct serialization for COO, CSR, and CSC formats.
  • Dictionary Bundling: Pack multiple tensors into a single nested dictionary packet.
  • LZ4 Compression: Optional high-speed compression for sparse or redundant data.

Data Integrity (XXH3)

Protect your tensors against network corruption with ultra-fast 64-bit checksums:

# Serialize with 64-bit checksum footer
packet = tenso.dumps(data, check_integrity=True)

# Verification is automatic during loads()
restored = tenso.loads(packet)

gRPC Integration

Tenso provides built-in support for gRPC, allowing you to pass tensors between services with minimal overhead.

from tenso.grpc import tenso_msg_pb2, tenso_msg_pb2_grpc
import tenso

# In your Servicer
def Predict(self, request, context):
    data = tenso.loads(request.tensor_packet)
    result = data * 2
    return tenso_msg_pb2.PredictResponse(
        result_packet=bytes(tenso.dumps(result))
    )

Protocol Design

Tenso uses a minimalist structure designed for direct memory access:

┌─────────────┬──────────────┬──────────────┬────────────────────────┬──────────────┐
│   HEADER    │    SHAPE     │   PADDING    │    BODY (Raw Data)     │    FOOTER    │
│   8 bytes   │  Variable    │   0-63 bytes │   C-Contiguous Array   │   8 bytes*   │
└─────────────┴──────────────┴──────────────┴────────────────────────┴──────────────┘
                                                                        (*Optional)

The padding ensures the body starts at a 64-byte boundary, enabling AVX-512 vectorization and zero-copy memory mapping.


Use Cases

  • Model Serving APIs: Up to 35x faster deserialization with 46x less CPU saves massive overhead on inference nodes.
  • Distributed Training: Efficiently pass gradients or activations between nodes with native Ray integration.
  • GPU-Direct Pipelines: Stream data from network cards to GPU memory with minimal host intervention.
  • Real-time Robotics: 10.2 µs latency for high-frequency sensor fusion (LIDAR, Radar).
  • High-Throughput Streaming: 89K packets/sec network transmission for real-time data pipelines.

Contributing

Contributions are welcome! We are currently looking for help with:

  • C++ / JavaScript Clients: Extending the protocol to other ecosystems.

License

Apache License 2.0 - see LICENSE file.

Citation

@software{tenso2025,
  author = {Khushiyant},
  title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
  year = {2025},
  url = {https://github.com/Khushiyant/tenso}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tenso-0.20.1.tar.gz (105.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tenso-0.20.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (421.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (421.6 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (421.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (423.9 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.1-cp311-abi3-win_amd64.whl (259.3 kB view details)

Uploaded CPython 3.11+Windows x86-64

tenso-0.20.1-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (422.3 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

tenso-0.20.1-cp311-abi3-macosx_11_0_arm64.whl (378.1 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

tenso-0.20.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (431.7 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file tenso-0.20.1.tar.gz.

File metadata

  • Download URL: tenso-0.20.1.tar.gz
  • Upload date:
  • Size: 105.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.20.1.tar.gz
Algorithm Hash digest
SHA256 45c5209cca874805f2be04acff718b748fe59d69dab198f0afd6de578d3c6915
MD5 40b20f648bd3ecc2b79746160f7e77fa
BLAKE2b-256 035f919493f57d12de1cb0033c37e0ef4f291a309eaae2dcfbb65cc3199e3d8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1.tar.gz:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a9ab69505055b43314985865fb43941ef8f818c43196f3b968ea005e8eae826c
MD5 dedee842ab7d4e5abc4b44c545b6fc09
BLAKE2b-256 b68d7a06387d96325ade1af367e6add102cdd44c2dce9023f917d363060d294a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 749044c0226e16dfcd60cf308f0aac3fc589c4bd86207c83f07bd9e9ccdd1584
MD5 6deab5e67a2c48c30cb15e10c7480051
BLAKE2b-256 869071be2585d4ec399bb7af7a8ebf1e4f910c9472c35844a07b4cd4987120d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 24776235c5d0c9d305a3b127a67e2c78b30ff0416273ce315622c29a2012c5ff
MD5 a3fbbe6601955daafd31c50bdf3748d9
BLAKE2b-256 ebe71b058fb6ef7c1fa1983dd3f4881223eb51dee8924aef11daf0b21f3c65cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cdc14a0cffa304598692f015ae46ca4d252a5edbe63f0178836109ab22b4a18e
MD5 c6dda2a73ed56ddd9e46a59121d19561
BLAKE2b-256 f53292e8d26a0691d2384b681a025b929166a9ca02f53f7b006824215a76be45

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: tenso-0.20.1-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 259.3 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.20.1-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7cbbe1c9fb7f522a38d75ac9fd40b5f96a51e6ce30acee0bd04c2e6d8b8bd56f
MD5 685bef33f0a4a28d8bb7ac363c240c2b
BLAKE2b-256 643211c8b50970f0cda3cd92a7bbf5e23e895d999ad7cfdf52f785b9acc2e8e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-cp311-abi3-win_amd64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.1-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 77ffa1b1dd761e990c167e92df4c64dd5d9b4156f205bed3e3777847b034e60c
MD5 5efe58a0f03e736ddc00e51d359ebd64
BLAKE2b-256 8ead7911f203c181fb26b68916441e0502e039b7c1d3a8a1f9b9a3d91e9ea42a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tenso-0.20.1-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c383662c963b514666b516a55831943dd9be34e8dfede15384cdf6f8fe25e8d7
MD5 3153291540d7eef2c73246ebd770c5c2
BLAKE2b-256 2d5f1fe2b1a83c3048c10c481c8ed66146daccaf6da2cd72e49ac3eaff3ab996

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tenso-0.20.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5e599d702735cb1812af98b3e707fe34fe96e4d465227ec787a2c191cf2bb253
MD5 3ed9a4f0c8f52447b00251b34c0c45b7
BLAKE2b-256 baedbfdfdbf7badfe5a0e86b45e9dfc9a9ea8289bdbb32907da5cf13e9b3ca70

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page