High-performance zero-copy tensor protocol
Project description
Tenso
Up to 12.6x faster than Apache Arrow. 32x less CPU than SafeTensors.
Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.
Why Tenso?
Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.
The Problem
Traditional formats waste CPU cycles during deserialization:
- SafeTensors: 41.3% CPU usage (great for disk, overkill for network)
- Pickle: 43.3% CPU usage + security vulnerabilities
- Arrow: Fast, but 12.6x slower than Tenso for large tensors
The Solution
Tenso achieves true zero-copy with:
- Minimalist Header: Fixed 8-byte header eliminates JSON parsing overhead.
- 64-byte Alignment: SIMD-ready padding ensures the data body is cache-line aligned.
- Direct Memory Mapping: The CPU points directly to existing buffers without copying.
Result: ~1.3% CPU usage vs >40% for SafeTensors/Pickle.
Benchmarks
System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, macOS
Deserialization Speed (8192×8192 Float32 Matrix)
| Format | Time | CPU Usage | Speedup |
|---|---|---|---|
| Tenso | 0.064ms | 1.3% | 1x |
| Arrow | 0.810ms | 1.2% | 12.6x slower |
| SafeTensors | 2.792ms | 41.3% | 43x slower |
| Pickle | 3.031ms | 43.3% | 47x slower |
**
Stream Reading Performance (95MB Packet)
| Method | Time | Throughput | Speedup |
|---|---|---|---|
| Tenso read_stream | 7.05ms | 13,534 MB/s | 1x |
| Naive loop | 7,399.7ms | 12.8 MB/s | 1,050x slower |
**
Installation
pip install tenso
Quick Start (v0.10.1)
Basic Serialization
import numpy as np
import tenso
# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)
# Serialize
packet = tenso.dumps(data)
# Deserialize (Zero-copy view)
restored = tenso.loads(packet)
Async I/O
import asyncio
import tenso
async def handle_client(reader, writer):
# Asynchronously read a tensor from the stream
data = await tenso.aread_stream(reader)
# Process and write back
await tenso.awrite_stream(data * 2, writer)
**
FastAPI Integration
from fastapi import FastAPI
import numpy as np
from tenso.fastapi import TensoResponse
app = FastAPI()
@app.get("/tensor")
async def get_tensor():
data = np.ones((1024, 1024), dtype=np.float32)
return TensoResponse(data) # Zero-copy streaming response
**
Advanced Features
GPU Acceleration (Direct Transfer)
Supports fast transfers between Tenso streams and device memory for CuPy, PyTorch, and JAX using pinned host memory.
import tenso.gpu as tgpu
# Read directly from a stream into a GPU tensor
torch_tensor = tgpu.read_to_device(stream, device_id=0)
Sparse Formats & Bundling
Tenso natively supports complex data structures beyond simple dense arrays:
- Sparse Matrices: Direct serialization for COO, CSR, and CSC formats.
- Dictionary Bundling: Pack multiple tensors into a single nested dictionary packet.
- LZ4 Compression: Optional high-speed compression for sparse or redundant data.
Data Integrity (XXH3)
Protect your tensors against network corruption with ultra-fast 64-bit checksums:
# Serialize with 64-bit checksum footer
packet = tenso.dumps(data, check_integrity=True)
# Verification is automatic during loads()
restored = tenso.loads(packet)
gRPC Integration
Tenso provides built-in support for gRPC, allowing you to pass tensors between services with minimal overhead.
from tenso.grpc import tenso_msg_pb2, tenso_msg_pb2_grpc
import tenso
# In your Servicer
def Predict(self, request, context):
data = tenso.loads(request.tensor_packet)
result = data * 2
return tenso_msg_pb2.PredictResponse(
result_packet=bytes(tenso.dumps(result))
)
**
Protocol Design
Tenso uses a minimalist structure designed for direct memory access:
┌─────────────┬──────────────┬──────────────┬────────────────────────┬──────────────┐
│ HEADER │ SHAPE │ PADDING │ BODY (Raw Data) │ FOOTER │
│ 8 bytes │ Variable │ 0-63 bytes │ C-Contiguous Array │ 8 bytes* │
└─────────────┴──────────────┴──────────────┴────────────────────────┴──────────────┘
(*Optional)
The padding ensures the body starts at a 64-byte boundary, enabling AVX-512 vectorization and zero-copy memory mapping.
Use Cases
- Model Serving APIs: 12.6x faster deserialization saves massive CPU overhead on inference nodes.
- Distributed Training: Efficiently pass gradients or activations between nodes (Ray, Spark).
- GPU-Direct Pipelines: Stream data from network cards to GPU memory with minimal host intervention.
- Real-time Robotics: Sub-millisecond latency for high-frequency sensor fusion (LIDAR, Radar).
Contributing
Contributions are welcome! We are currently looking for help with:
- Rust Core: Porting serialization logic to Rust for even lower overhead.
- C++ / JavaScript Clients: Extending the protocol to other ecosystems.
License
Apache License 2.0 - see LICENSE file.
Citation
@software{tenso2025,
author = {Khushiyant},
title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
year = {2025},
version = {0.10.1},
url = {[https://github.com/Khushiyant/tenso](https://github.com/Khushiyant/tenso)}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tenso-0.11.0.tar.gz.
File metadata
- Download URL: tenso-0.11.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5c13b2cf1e33eb1cc69cad699b43f821a7ae42d135a71aed5f755dc4edfb428
|
|
| MD5 |
6640a4bd5146919cde13113573a6bea3
|
|
| BLAKE2b-256 |
f0766badc9ef5080927d1b0f8f2d2cc7253a6171fb21c16654c2a7bd58d388b8
|
Provenance
The following attestation bundles were made for tenso-0.11.0.tar.gz:
Publisher:
release.yml on Khushiyant/tenso
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tenso-0.11.0.tar.gz -
Subject digest:
c5c13b2cf1e33eb1cc69cad699b43f821a7ae42d135a71aed5f755dc4edfb428 - Sigstore transparency entry: 781573150
- Sigstore integration time:
-
Permalink:
Khushiyant/tenso@e89c94dc60f3f44ff512e8be36bc807573962221 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Khushiyant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e89c94dc60f3f44ff512e8be36bc807573962221 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tenso-0.11.0-py3-none-any.whl.
File metadata
- Download URL: tenso-0.11.0-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cf6f7b502c69cc0526ea3c42da4e9a0edb17da0679b8b511ee2dd08199ad960
|
|
| MD5 |
fefe9e0ff4be8fe776bad1b650f94671
|
|
| BLAKE2b-256 |
275f85664025f5dbdb4b3ba8133e8d5b5b5da0374836c70d4494f3b228cbbd39
|
Provenance
The following attestation bundles were made for tenso-0.11.0-py3-none-any.whl:
Publisher:
release.yml on Khushiyant/tenso
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tenso-0.11.0-py3-none-any.whl -
Subject digest:
6cf6f7b502c69cc0526ea3c42da4e9a0edb17da0679b8b511ee2dd08199ad960 - Sigstore transparency entry: 781573157
- Sigstore integration time:
-
Permalink:
Khushiyant/tenso@e89c94dc60f3f44ff512e8be36bc807573962221 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Khushiyant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e89c94dc60f3f44ff512e8be36bc807573962221 -
Trigger Event:
push
-
Statement type: