Skip to main content

High-performance zero-copy tensor protocol

Project description

Tenso Banner

Tenso

Up to 35x faster than Apache Arrow on deserialization. 46x less CPU than SafeTensors.

Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.

PyPI version Python 3.10+ License: Apache 2.0


Why Tenso?

Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.

The Problem

Traditional formats waste CPU cycles during deserialization:

  • SafeTensors: 37.1% CPU usage (great for disk, overkill for network)
  • Pickle: 40.9% CPU usage + security vulnerabilities
  • Arrow: Faster on serialization, but up to 32x slower on deserialization for large tensors

The Solution

Tenso achieves true zero-copy with:

  • Minimalist Header: Fixed 8-byte header eliminates JSON parsing overhead.
  • 64-byte Alignment: SIMD-ready padding ensures the data body is cache-line aligned.
  • Direct Memory Mapping: The CPU points directly to existing buffers without copying.

Result: 0.8% CPU usage vs >40% for SafeTensors/Pickle.


Benchmarks

System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, M4 Pro

1. In-Memory Serialization (LLM Layer - 64MB)

Format Size Serialize Deserialize Speedup (Deser)
Tenso 64.00 MB 3.51 ms 0.004 ms 1x
Arrow 64.00 MB 7.06 ms 0.011 ms 2.8x slower
SafeTensors 64.00 MB 8.14 ms 2.39 ms 597x slower
Pickle 64.00 MB 2.93 ms 2.71 ms 677x slower
MsgPack 64.00 MB 10.44 ms 3.05 ms 763x slower

Note: Tenso (Vect) variant is even faster with 0.000 ms deserialize time.

2. Disk I/O (256 MB Matrix)

Format Write Read
Tenso 29.41 ms 36.28 ms
NumPy .npy 24.83 ms 43.08 ms
Pickle 49.90 ms 24.24 ms

3. Stream Reading (95 MB Packet)

Method Time Throughput Speedup
Tenso read_stream 7.68 ms 12,417 MB/s 1x
Optimised Loop 13.89 ms 7,396 MB/s 1.9x slower

4. CPU Usage (Efficiency)

Format Serialize CPU% Deserialize CPU%
Tenso 117.3% 0.8%
Arrow 57.1% 1.0%
SafeTensors 67.1% 37.1%
Pickle 44.0% 40.9%

5. Arrow vs Tenso (Comparison)

Size Tenso Ser Arrow Ser Tenso Des Arrow Des Speedup
Small 0.130ms 0.056ms 0.009ms 0.035ms 4.1x
Medium 0.972ms 0.912ms 0.020ms 0.040ms 2.0x
Large 3.166ms 3.655ms 0.019ms 0.222ms 11.8x
XLarge 19.086ms 28.726ms 0.023ms 0.733ms 32.0x

6. Network Performance

  • Packet Throughput: 89,183 packets/sec (over localhost TCP)
  • Latency: 11.2 µs/packet
  • Async Write Throughput: 88,397 MB/s (1.4M tensors/sec)

Star History

Star History Chart


Installation

pip install tenso

Optional extras:

pip install tenso[api]    # gRPC, FastAPI, Ray integration
pip install tenso[gpu]    # GPU acceleration (CuPy/PyTorch/JAX)

Quick Start

Basic Serialization

import numpy as np
import tenso

# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)

# Serialize
packet = tenso.dumps(data)

# Deserialize (Zero-copy view)
restored = tenso.loads(packet)

Async I/O

import asyncio
import tenso

async def handle_client(reader, writer):
    # Asynchronously read a tensor from the stream
    data = await tenso.aread_stream(reader)
    
    # Process and write back
    await tenso.awrite_stream(data * 2, writer)

FastAPI Integration

from fastapi import FastAPI
import numpy as np
from tenso.fastapi import TensoResponse

app = FastAPI()

@app.get("/tensor")
async def get_tensor():
    data = np.ones((1024, 1024), dtype=np.float32)
    return TensoResponse(data) # Zero-copy streaming response

Advanced Features

Ray Integration (Distributed Computing)

Replace pickle-based serialization in Ray with Tenso for 46x less CPU overhead on tensor operations. Works transparently with ray.put(), ray.get(), remote functions, and actors.

import ray
import numpy as np
from tenso.ray import register

ray.init()
register()  # Register Tenso as the serializer for numpy arrays

# All ray.put/get operations now use Tenso
ref = ray.put(np.zeros((1000, 1000)))
arr = ray.get(ref)  # Deserialized via Tenso

# Works transparently with remote functions
@ray.remote
def process(tensor):
    return tensor.mean()

ray.get(process.remote(np.random.randn(1000, 1000)))

Optional support for PyTorch and JAX tensors:

register(include_torch=True, include_jax=True)

Quantized Tensors (4-bit & 8-bit)

Native support for quantized representations to reduce memory footprint with minimal accuracy loss.

from tenso.quantize import QuantizedTensor
import numpy as np

data = np.random.randn(1024, 1024).astype(np.float32)

# Quantize to 8-bit (per-tensor scheme)
qt = QuantizedTensor.quantize(data, dtype="qint8", scheme="per_tensor")
print(qt)  # QuantizedTensor(dtype=qint8, shape=(1024, 1024), ...)

# Serialize/deserialize with Tenso
import tenso
packet = tenso.dumps(qt)
restored = tenso.loads(packet)

# Dequantize back to float32
result = restored.dequantize()

Supported dtypes: qint8, quint8, qint4, quint4 Supported schemes: per_tensor, per_channel, per_group

Inter-Process Communication (Shared Memory)

Transfer tensors between local processes with single-digit microsecond latency using Shared Memory. This avoids socket overhead entirely by passing memory handles.

from tenso import TensoShm
import numpy as np

# Process A: Write to Shared Memory
data = np.random.randn(1024, 1024).astype(np.float32)
# Automatically sizes and creates the SHM segment
with TensoShm.create_from("shared_tensor_01", data) as shm:
    print("Tensor is in SHM. Waiting for reader...")
    input() # Keep process alive

# Process B: Read from Shared Memory (Zero-Copy)
with TensoShm("shared_tensor_01") as shm:
    # Instant view of the data without copying
    array = shm.get()
    print(f"Received: {array.shape}")

GPU Acceleration (Direct Transfer)

Supports fast transfers between Tenso streams and device memory for CuPy, PyTorch, and JAX using pinned host memory.

import tenso.gpu as tgpu

# Read directly from a stream into a GPU tensor
torch_tensor = tgpu.read_to_device(stream, device_id=0)

bfloat16 Support

Native support for bfloat16 dtype, commonly used in ML training. Works with NumPy 2.1+ natively or falls back to ml_dtypes.

import numpy as np
import tenso

# Serialize bfloat16 tensors directly
data = np.ones((512, 512), dtype=np.float32)  # or bfloat16 if available
packet = tenso.dumps(data)

Sparse Formats & Bundling

Tenso natively supports complex data structures beyond simple dense arrays:

  • Sparse Matrices: Direct serialization for COO, CSR, and CSC formats.
  • Dictionary Bundling: Pack multiple tensors into a single nested dictionary packet.
  • LZ4 Compression: Optional high-speed compression for sparse or redundant data.

Data Integrity (XXH3)

Protect your tensors against network corruption with ultra-fast 64-bit checksums:

# Serialize with 64-bit checksum footer
packet = tenso.dumps(data, check_integrity=True)

# Verification is automatic during loads()
restored = tenso.loads(packet)

gRPC Integration

Tenso provides built-in support for gRPC, allowing you to pass tensors between services with minimal overhead.

from tenso.grpc import tenso_msg_pb2, tenso_msg_pb2_grpc
import tenso

# In your Servicer
def Predict(self, request, context):
    data = tenso.loads(request.tensor_packet)
    result = data * 2
    return tenso_msg_pb2.PredictResponse(
        result_packet=bytes(tenso.dumps(result))
    )

Protocol Design

Tenso uses a minimalist structure designed for direct memory access:

┌─────────────┬──────────────┬──────────────┬────────────────────────┬──────────────┐
│   HEADER    │    SHAPE     │   PADDING    │    BODY (Raw Data)     │    FOOTER    │
│   8 bytes   │  Variable    │   0-63 bytes │   C-Contiguous Array   │   8 bytes*   │
└─────────────┴──────────────┴──────────────┴────────────────────────┴──────────────┘
                                                                        (*Optional)

The padding ensures the body starts at a 64-byte boundary, enabling AVX-512 vectorization and zero-copy memory mapping.


Use Cases

  • Model Serving APIs: Up to 35x faster deserialization with 46x less CPU saves massive overhead on inference nodes.
  • Distributed Training: Efficiently pass gradients or activations between nodes with native Ray integration.
  • GPU-Direct Pipelines: Stream data from network cards to GPU memory with minimal host intervention.
  • Real-time Robotics: 10.2 µs latency for high-frequency sensor fusion (LIDAR, Radar).
  • High-Throughput Streaming: 89K packets/sec network transmission for real-time data pipelines.

Contributing

Contributions are welcome! We are currently looking for help with:

  • C++ / JavaScript Clients: Extending the protocol to other ecosystems.

License

Apache License 2.0 - see LICENSE file.

Citation

@software{tenso2025,
  author = {Khushiyant},
  title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
  year = {2025},
  url = {https://github.com/Khushiyant/tenso}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tenso-0.20.0.tar.gz (105.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tenso-0.20.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (421.6 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (421.5 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (421.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (423.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.20.0-cp311-abi3-win_amd64.whl (258.4 kB view details)

Uploaded CPython 3.11+Windows x86-64

tenso-0.20.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (421.8 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

tenso-0.20.0-cp311-abi3-macosx_11_0_arm64.whl (378.3 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

tenso-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (431.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file tenso-0.20.0.tar.gz.

File metadata

  • Download URL: tenso-0.20.0.tar.gz
  • Upload date:
  • Size: 105.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.20.0.tar.gz
Algorithm Hash digest
SHA256 3e6295ea991bd6f18ae4647ce13a5c6dc75569eeec81e4ab1bd579ec2d93f778
MD5 6541dc0da46c306cc6e531695bc912a6
BLAKE2b-256 90c0336c33e9109595a2457fc7f61363700b8f29bdc61b72f7b58006dd46a1af

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0.tar.gz:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 021d4ed57813caf8d6802b7df02da0650fb5c744ef2ab1c3201618bccfe251f6
MD5 fb21ac31d25620e616050a6567cf7419
BLAKE2b-256 80d01c39964ff5f5f7eebd3ca87ec81615597266180bffffef6c52d8fdefc213

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3bb4b0adb6d6c7e7de77ee6e691c83dc92ef52efe99666daca518b571406f353
MD5 bd1c38a663de37705840fb31d8b05b1b
BLAKE2b-256 b6e3cd7914089368b786a14ccfe955482f2c7aa930c38976df8027183745baa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 24cc15c5646e9056cb0a0a62e54607ad2c404941058b8bdc4cf8fc6c5253cfa4
MD5 6776c1ab581bea87dc2eac08b41aa3f0
BLAKE2b-256 b873eef3bc58e4f93303c686b927bb59a6f9457c3dfa443346f158e84d668a3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cbb09ae43a3684c736e767e1ebaf99692f45e5ee8d71470eacbb97cb796d814d
MD5 2fff633c3beb2035e3b2e5d4617b3c4d
BLAKE2b-256 4af08ac78d6439adab28c70040ad98b19a32349dbdae88c82283f45d6b312dc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: tenso-0.20.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 258.4 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.20.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1856a8cb479b5707f29bdbafd20064afb1f9702271be264622efa9f109c6bbd1
MD5 4913ecf978d83a1996ee180084df97f1
BLAKE2b-256 633b8bc86447daeb703c47ce5a0c79b089f43d409faea548cf8ad2949cc0d032

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-cp311-abi3-win_amd64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.20.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 60eabf131226dff018d357e19ea6288a64913f08566b3ff29a3b9563b44f08dc
MD5 1b375b679cdef9ad1a5f565b845c949f
BLAKE2b-256 6e7e5b8134f354b1a9e38c657f8d687be1e40ecb7331dbf33dc4d50f4f9eea4a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tenso-0.20.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aae20cb81d024fa8c6c92233f4825826e1c64921ae201411136100cc562e8b04
MD5 eacb9da3a900cd00b0d28ce458c2d2b7
BLAKE2b-256 3128df20c2b8bf43d4d4cff8eb06c288606db02394d89879be76b8929bf25af9

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tenso-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f08e77f1e83155caec0a5696e6991d0b4d81b0a6664d98089dd9dddc0aeda6fe
MD5 1079add43aa4a4e185d82113d8a632fe
BLAKE2b-256 f163772539b08ace3c19a22a57c367a91803582b97a8f13222560e62edbc7109

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page