Skip to main content

High-performance zero-copy tensor protocol

Project description

Tenso Banner

Tenso

Up to 35x faster than Apache Arrow on deserialization. 46x less CPU than SafeTensors.

Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.

PyPI version Python 3.10+ License: Apache 2.0


Why Tenso?

Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.

The Problem

Traditional formats waste CPU cycles during deserialization:

  • SafeTensors: 37.1% CPU usage (great for disk, overkill for network)
  • Pickle: 40.9% CPU usage + security vulnerabilities
  • Arrow: Faster on serialization, but up to 32x slower on deserialization for large tensors

The Solution

Tenso achieves true zero-copy with:

  • Minimalist Header: Fixed 8-byte header eliminates JSON parsing overhead.
  • 64-byte Alignment: SIMD-ready padding ensures the data body is cache-line aligned.
  • Direct Memory Mapping: The CPU points directly to existing buffers without copying.

Result: 0.8% CPU usage vs >40% for SafeTensors/Pickle.


Benchmarks

System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, M4 Pro

1. In-Memory Serialization (LLM Layer - 64MB)

Format Size Serialize Deserialize Speedup (Deser)
Tenso 64.00 MB 3.51 ms 0.004 ms 1x
Arrow 64.00 MB 7.06 ms 0.011 ms 2.8x slower
SafeTensors 64.00 MB 8.14 ms 2.39 ms 597x slower
Pickle 64.00 MB 2.93 ms 2.71 ms 677x slower
MsgPack 64.00 MB 10.44 ms 3.05 ms 763x slower

Note: Tenso (Vect) variant is even faster with 0.000 ms deserialize time.

2. Disk I/O (256 MB Matrix)

Format Write Read
Tenso 29.41 ms 36.28 ms
NumPy .npy 24.83 ms 43.08 ms
Pickle 49.90 ms 24.24 ms

3. Stream Reading (95 MB Packet)

Method Time Throughput Speedup
Tenso read_stream 7.68 ms 12,417 MB/s 1x
Optimised Loop 13.89 ms 7,396 MB/s 1.9x slower

4. CPU Usage (Efficiency)

Format Serialize CPU% Deserialize CPU%
Tenso 117.3% 0.8%
Arrow 57.1% 1.0%
SafeTensors 67.1% 37.1%
Pickle 44.0% 40.9%

5. Arrow vs Tenso (Comparison)

Size Tenso Ser Arrow Ser Tenso Des Arrow Des Speedup
Small 0.130ms 0.056ms 0.009ms 0.035ms 4.1x
Medium 0.972ms 0.912ms 0.020ms 0.040ms 2.0x
Large 3.166ms 3.655ms 0.019ms 0.222ms 11.8x
XLarge 19.086ms 28.726ms 0.023ms 0.733ms 32.0x

6. Network Performance

  • Packet Throughput: 89,183 packets/sec (over localhost TCP)
  • Latency: 11.2 µs/packet
  • Async Write Throughput: 88,397 MB/s (1.4M tensors/sec)

Installation

pip install tenso

Optional extras:

pip install tenso[ray]    # Ray integration
pip install tenso[gpu]    # GPU acceleration (CuPy/PyTorch/JAX)
pip install tenso[grpc]   # gRPC support

Quick Start

Basic Serialization

import numpy as np
import tenso

# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)

# Serialize
packet = tenso.dumps(data)

# Deserialize (Zero-copy view)
restored = tenso.loads(packet)

Async I/O

import asyncio
import tenso

async def handle_client(reader, writer):
    # Asynchronously read a tensor from the stream
    data = await tenso.aread_stream(reader)
    
    # Process and write back
    await tenso.awrite_stream(data * 2, writer)

FastAPI Integration

from fastapi import FastAPI
import numpy as np
from tenso.fastapi import TensoResponse

app = FastAPI()

@app.get("/tensor")
async def get_tensor():
    data = np.ones((1024, 1024), dtype=np.float32)
    return TensoResponse(data) # Zero-copy streaming response

Advanced Features

Ray Integration (Distributed Computing)

Replace pickle-based serialization in Ray with Tenso for 46x less CPU overhead on tensor operations. Works transparently with ray.put(), ray.get(), remote functions, and actors.

import ray
import numpy as np
from tenso.ray import register

ray.init()
register()  # Register Tenso as the serializer for numpy arrays

# All ray.put/get operations now use Tenso
ref = ray.put(np.zeros((1000, 1000)))
arr = ray.get(ref)  # Deserialized via Tenso

# Works transparently with remote functions
@ray.remote
def process(tensor):
    return tensor.mean()

ray.get(process.remote(np.random.randn(1000, 1000)))

Optional support for PyTorch and JAX tensors:

register(include_torch=True, include_jax=True)

Quantized Tensors (4-bit & 8-bit)

Native support for quantized representations to reduce memory footprint with minimal accuracy loss.

from tenso.quantize import QuantizedTensor
import numpy as np

data = np.random.randn(1024, 1024).astype(np.float32)

# Quantize to 8-bit (per-tensor scheme)
qt = QuantizedTensor.quantize(data, dtype="qint8", scheme="per_tensor")
print(qt)  # QuantizedTensor(dtype=qint8, shape=(1024, 1024), ...)

# Serialize/deserialize with Tenso
import tenso
packet = tenso.dumps(qt)
restored = tenso.loads(packet)

# Dequantize back to float32
result = restored.dequantize()

Supported dtypes: qint8, quint8, qint4, quint4 Supported schemes: per_tensor, per_channel, per_group

Inter-Process Communication (Shared Memory)

Transfer tensors between local processes with single-digit microsecond latency using Shared Memory. This avoids socket overhead entirely by passing memory handles.

from tenso import TensoShm
import numpy as np

# Process A: Write to Shared Memory
data = np.random.randn(1024, 1024).astype(np.float32)
# Automatically sizes and creates the SHM segment
with TensoShm.create_from("shared_tensor_01", data) as shm:
    print("Tensor is in SHM. Waiting for reader...")
    input() # Keep process alive

# Process B: Read from Shared Memory (Zero-Copy)
with TensoShm("shared_tensor_01") as shm:
    # Instant view of the data without copying
    array = shm.get()
    print(f"Received: {array.shape}")

GPU Acceleration (Direct Transfer)

Supports fast transfers between Tenso streams and device memory for CuPy, PyTorch, and JAX using pinned host memory.

import tenso.gpu as tgpu

# Read directly from a stream into a GPU tensor
torch_tensor = tgpu.read_to_device(stream, device_id=0)

bfloat16 Support

Native support for bfloat16 dtype, commonly used in ML training. Works with NumPy 2.1+ natively or falls back to ml_dtypes.

import numpy as np
import tenso

# Serialize bfloat16 tensors directly
data = np.ones((512, 512), dtype=np.float32)  # or bfloat16 if available
packet = tenso.dumps(data)

Sparse Formats & Bundling

Tenso natively supports complex data structures beyond simple dense arrays:

  • Sparse Matrices: Direct serialization for COO, CSR, and CSC formats.
  • Dictionary Bundling: Pack multiple tensors into a single nested dictionary packet.
  • LZ4 Compression: Optional high-speed compression for sparse or redundant data.

Data Integrity (XXH3)

Protect your tensors against network corruption with ultra-fast 64-bit checksums:

# Serialize with 64-bit checksum footer
packet = tenso.dumps(data, check_integrity=True)

# Verification is automatic during loads()
restored = tenso.loads(packet)

gRPC Integration

Tenso provides built-in support for gRPC, allowing you to pass tensors between services with minimal overhead.

from tenso.grpc import tenso_msg_pb2, tenso_msg_pb2_grpc
import tenso

# In your Servicer
def Predict(self, request, context):
    data = tenso.loads(request.tensor_packet)
    result = data * 2
    return tenso_msg_pb2.PredictResponse(
        result_packet=bytes(tenso.dumps(result))
    )

Protocol Design

Tenso uses a minimalist structure designed for direct memory access:

┌─────────────┬──────────────┬──────────────┬────────────────────────┬──────────────┐
│   HEADER    │    SHAPE     │   PADDING    │    BODY (Raw Data)     │    FOOTER    │
│   8 bytes   │  Variable    │   0-63 bytes │   C-Contiguous Array   │   8 bytes*   │
└─────────────┴──────────────┴──────────────┴────────────────────────┴──────────────┘
                                                                        (*Optional)

The padding ensures the body starts at a 64-byte boundary, enabling AVX-512 vectorization and zero-copy memory mapping.


Use Cases

  • Model Serving APIs: Up to 35x faster deserialization with 46x less CPU saves massive overhead on inference nodes.
  • Distributed Training: Efficiently pass gradients or activations between nodes with native Ray integration.
  • GPU-Direct Pipelines: Stream data from network cards to GPU memory with minimal host intervention.
  • Real-time Robotics: 10.2 µs latency for high-frequency sensor fusion (LIDAR, Radar).
  • High-Throughput Streaming: 89K packets/sec network transmission for real-time data pipelines.

Contributing

Contributions are welcome! We are currently looking for help with:

  • C++ / JavaScript Clients: Extending the protocol to other ecosystems.

License

Apache License 2.0 - see LICENSE file.

Citation

@software{tenso2025,
  author = {Khushiyant},
  title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
  year = {2025},
  url = {https://github.com/Khushiyant/tenso}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tenso-0.19.3.tar.gz (92.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tenso-0.19.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.2 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (405.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.3-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (407.2 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

tenso-0.19.3-cp311-abi3-win_amd64.whl (246.1 kB view details)

Uploaded CPython 3.11+Windows x86-64

tenso-0.19.3-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (414.1 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

tenso-0.19.3-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (404.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

tenso-0.19.3-cp311-abi3-macosx_11_0_arm64.whl (360.9 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file tenso-0.19.3.tar.gz.

File metadata

  • Download URL: tenso-0.19.3.tar.gz
  • Upload date:
  • Size: 92.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.19.3.tar.gz
Algorithm Hash digest
SHA256 b16cbb25434f653d9a137f42ccdc31833bb815ae7cde4955038522586ea3761c
MD5 849b6cbf6d4d8889a87913cb2ffea27e
BLAKE2b-256 895dc9d769191dbc5063107e6fdb84bda9d806dff7b51afb50ff4a3901f43d8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3.tar.gz:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 632d4ba8fd2445f40caaa21034933483cd968f241d88ec557521ea56d580418b
MD5 75e0e8350fab137a4e7a2ee0f77365ac
BLAKE2b-256 7afb37798a93b109c778b63d5178d1e39541df5b43ee0a83a43ca6280b3c1623

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8306d7ad0006ba18c167a4a8776f05edd142571a03846f28826213a254bb51cc
MD5 4d01b0bc5a132fc1b92fa7d3fa8408f1
BLAKE2b-256 b1d8251e04a250dffb1e23410630240614ab33e030f2fdbf987d6d283c1836ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 459b33b6b115429518fffd33ec98a982356ef00eca5fa0354202ff1221604111
MD5 5ae5b5417c2f30f598bc7ee81dc3adba
BLAKE2b-256 a8687eee4c18dda421e05a26f7f359dd40d5a6fe586892d9c3b6271b30409518

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.3-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 bacf1ffdf6469fb9443ac8f4682d986a7c9659b42f0a0e747eb860bcd782e2cd
MD5 d6f66c125df734f3c31b0597615b8718
BLAKE2b-256 da495e4a41af65b2806b8d238840e59eb89808450c1da14fb33d8f1dd43a5348

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: tenso-0.19.3-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 246.1 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.19.3-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 abbd0c1a43dca71dc56bca703515ad58a96c1e6cb7056ef55ca68ce88a034bbf
MD5 4aa0d2b011d80a16475b84201ef4bb61
BLAKE2b-256 1dd533598c5ee770f86855f03bd5a1ec92cc0f594f26208fbf33865c93300cbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-cp311-abi3-win_amd64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tenso-0.19.3-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1d16c0457021af02012c2b1d3eac8cdffa635ca7a04c6e29f69e064cba7c8165
MD5 af067db536455cc9d420157c7b4948ae
BLAKE2b-256 a28f2d60e906052a4f17fb59e58d642629b1fb747319e33912f1a5f8a0e7ab6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tenso-0.19.3-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9587fe448fd1fdf30c1e198b914a60f352f5760dd9a32f9917eae2481873e9e2
MD5 6dc676f5b1e06386135b9eed03f453bd
BLAKE2b-256 0fe0b46727a8f91771df88e032b5092c87c64116d7068a4512b778aab076a9d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.19.3-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tenso-0.19.3-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8890060bb43d7ce131fadb4c73dd8f8d85dc373cdc49b36b205025d660196a62
MD5 b0c731bb904a55d6f5c9edd28a953909
BLAKE2b-256 e049a1ec09bf3f9f5da40f0a5eb0ca2ca887c068b4e008b22cc25d62297bd58d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.19.3-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page