High-performance zero-copy tensor protocol

Project description

Tenso

Up to 23.8x faster than Apache Arrow. 61x less CPU than SafeTensors.

Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.

Why Tenso?

Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.

The Problem

Traditional formats waste CPU cycles:

SafeTensors: 36.7% CPU usage per deserialization (great for disk, overkill for network)
Pickle: 41.7% CPU usage + security vulnerabilities
Arrow: Fast, but 23.8x slower than Tenso for large tensors

The Solution

Tenso achieves true zero-copy with:

Fixed 8-byte header (no JSON parsing overhead)
64-byte memory alignment (SIMD-ready)
Direct memory mapping (CPU just points, never copies)

Result: 0.6% CPU usage vs 36.7% for SafeTensors

Benchmarks

System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, macOS

Deserialization Speed (8192×8192 Float32 Matrix)

Format	Time	CPU Usage	Speedup
Tenso	0.034ms	0.6%	1x
Arrow	0.805ms	1.1%	23.8x slower
SafeTensors	2.621ms	36.7%	77x slower
Pickle	3.293ms	41.7%	97x slower

Stream Reading Performance (95MB Packet)

Method	Time	Throughput	Speedup
Tenso read_stream	21ms	4,500 MB/s	1x
Naive loop	7,870ms	12 MB/s	371x slower

Network Latency (1KB Tensor over TCP)

Metric	Value
Throughput	182,940 packets/sec
Latency	5.5 μs/packet

Real-World Impact

Scenario: Inference API serving 10,000 req/sec with 64MB tensors

Format	CPU Cores Used	Monthly Cost*
SafeTensors	367 cores	~$15,000
Tenso	6 cores	~$245

*Based on typical cloud compute pricing

Installation

pip install tenso

Quick Start

Basic Serialization

import numpy as np
import tenso

# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)

# Serialize (8.5ms for 64MB)
packet = tenso.dumps(data)

# Deserialize (0.034ms for 64MB)
restored = tenso.loads(packet)

Network Communication

import socket
import tenso

# Server: Receive tensor
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('0.0.0.0', 9999))
server.listen(1)
conn, addr = server.accept()

# Zero-copy read with automatic buffering
tensor = tenso.read_stream(conn)  # Uses readinto() internally

# Client: Send tensor
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(('localhost', 9999))

data = np.random.rand(256, 256).astype(np.float32)
tenso.write_stream(data, client)  # Atomic write with os.writev

File I/O with Memory Mapping

# Write to disk
with open("model_weights.tenso", "wb") as f:
    tenso.dump(large_tensor, f)

# Instant load (no matter the size)
with open("model_weights.tenso", "rb") as f:
    weights = tenso.load(f, mmap_mode=True)  # Memory-mapped, not loaded into RAM

Use Cases

Perfect For

Model Serving APIs - 23.8x faster deserialization saves CPU cores
Distributed Training - Efficient gradient/activation passing (Ray, Spark)
Real-time Robotics - Sub-millisecond latency sensor fusion
High-Frequency Trading - Microsecond-precision data exchange
Microservices - Fast tensor exchange between services
Edge Devices - Minimal dependencies, pure Python

Consider Alternatives For

Long-term Model Storage - Use SafeTensors (better ecosystem, HuggingFace integration)
Multi-column Dataframes - Use Arrow (designed for tabular data)
Arbitrary Python Objects - Use Pickle (if you trust the source)

Protocol Design

Tenso uses a minimalist 4-part structure:

┌─────────────┬──────────────┬──────────────┬────────────────────────┐
│   HEADER    │    SHAPE     │   PADDING    │    BODY (Raw Data)     │
│   8 bytes   │  Variable    │   0-63 bytes │   C-Contiguous Array   │
└─────────────┴──────────────┴──────────────┴────────────────────────┘

Header (8 bytes)

[4 bytes: Magic "TNSO"]
[1 byte:  Protocol Version (2)]
[1 byte:  Flags (alignment, etc.)]
[1 byte:  Dtype Code]
[1 byte:  Number of Dimensions]

Why This Is Fast

SafeTensors: Uses JSON header - 3.67ms parsing overhead
Arrow: Complex IPC format with schema validation - 0.805ms overhead
Tenso: Fixed 8-byte struct - 0.034ms (just unpack and memory map)

The padding ensures the data body starts at a 64-byte boundary, enabling:

AVX-512 vectorization
Zero-copy memory mapping
Cache-line alignment

Advanced Features

Strict Mode

Prevents accidental memory copies:

# Force C-contiguous check
try:
    packet = tenso.dumps(fortran_array, strict=True)
except ValueError:
    print("Array must be C-contiguous!")
    fortran_array = np.ascontiguousarray(fortran_array)

Packet Introspection

Inspect metadata without deserializing:

info = tenso.get_packet_info(packet)
print(f"Shape: {info['shape']}")
print(f"Dtype: {info['dtype']}")
print(f"Size: {info['data_size_bytes']} bytes")

Supported Dtypes

All NumPy numeric types including:

Floats: float16, float32, float64
Integers: int8, int16, int32, int64, uint8, uint16, uint32, uint64
Complex: complex64, complex128
Boolean: bool

Comparison Table

Feature	Tenso	Arrow	SafeTensors	Pickle
Deserialize Speed (64MB)	0.034ms	0.805ms	2.621ms	3.293ms
CPU Usage	0.6%	1.1%	36.7%	41.7%
Memory Overhead	0.00%	0.00%	0.00%	0.00%
Security	Safe	Safe	Safe	RCE Risk
Dependencies	NumPy only	PyArrow (large)	Rust bindings	Python stdlib
Best For	Network/IPC	Dataframes	Disk storage	Python objects
SIMD Aligned	64-byte	64-byte	No	No

Performance Deep-Dive

Read the full story: Breaking the Speed Limit: Optimizing Python Tensor Serialization to 5 GB/s

Key insights:

Why JSON headers kill performance
How memory alignment enables zero-copy
Why Tenso beats Arrow for single tensors
Real-world cost savings ($15k/month at scale)

Development

# Clone repository
git clone https://github.com/Khushiyant/tenso.git
cd tenso

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run comprehensive benchmarks
python benchmark.py all

# Quick benchmark (serialization + Arrow comparison)
python benchmark.py quick

Requirements

Python >= 3.10
NumPy >= 1.20

Optional (for benchmarks):

pyarrow - Compare with Apache Arrow
safetensors - Compare with SafeTensors
msgpack - Compare with MessagePack
psutil - Monitor CPU/memory usage

Contributing

Contributions welcome. Areas we'd love help with:

Async support (async def aread_stream())
Compression integration (zstd, lz4)
gRPC/FastAPI integration examples
Rust bindings for even faster serialization
JavaScript/WASM client for browser ML
CUDA support for GPU-direct transfers

License

MIT License - see LICENSE file.

Citation

If you use Tenso in research, please cite:

@software{tenso2025,
  author = {Khushiyant},
  title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
  year = {2025},
  url = {https://github.com/Khushiyant/tenso}
}

Acknowledgments

Inspired by the need for faster ML inference infrastructure. Built with care for the ML community.

Star this repo if Tenso saved you CPU cycles.

Project details

Release history Release notifications | RSS feed

0.20.1

Mar 28, 2026

0.20.0

Mar 28, 2026

0.19.4

Feb 14, 2026

0.19.3

Feb 14, 2026

0.19.2

Feb 14, 2026

0.12.1

Dec 29, 2025

0.12.0

Dec 29, 2025

0.11.0

Dec 29, 2025

0.10.1

Dec 27, 2025

0.10.0

Dec 26, 2025

0.9.1

Dec 22, 2025

0.9.0

Dec 22, 2025

0.8.1

Dec 21, 2025

0.8.0

Dec 21, 2025

0.7.0

Dec 21, 2025

0.6.1

Dec 18, 2025

This version

0.6.0

Dec 16, 2025

0.5.1

Dec 15, 2025

0.5.0

Dec 15, 2025

0.4.7

Dec 14, 2025

0.4.3

Dec 14, 2025

0.3.2

Dec 10, 2025

0.2.2

Dec 4, 2025

0.2.1

Dec 4, 2025

0.2.0

Dec 4, 2025

0.1.1

Dec 3, 2025

0.1.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tenso-0.6.0.tar.gz (10.7 kB view details)

Uploaded Dec 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tenso-0.6.0-py3-none-any.whl (13.6 kB view details)

Uploaded Dec 16, 2025 Python 3

File details

Details for the file tenso-0.6.0.tar.gz.

File metadata

Download URL: tenso-0.6.0.tar.gz
Upload date: Dec 16, 2025
Size: 10.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`9d90151eeb72eb2087548afb08cb0320336d0c2bf0ecd0cfde58fd5ec9432e27`
MD5	`3eeb25279e98de9a1c3be99fde2de093`
BLAKE2b-256	`d2c6f81522a284dc5078e9652c24fd72e135cceb45f74ee1670d469cd755fa85`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.6.0.tar.gz:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tenso-0.6.0.tar.gz
- Subject digest: 9d90151eeb72eb2087548afb08cb0320336d0c2bf0ecd0cfde58fd5ec9432e27
- Sigstore transparency entry: 767281708
- Sigstore integration time: Dec 16, 2025
Source repository:
- Permalink: Khushiyant/tenso@8ab1c43f0ca36c816dd313bb253902153405289f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Khushiyant
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8ab1c43f0ca36c816dd313bb253902153405289f
- Trigger Event: push

File details

Details for the file tenso-0.6.0-py3-none-any.whl.

File metadata

Download URL: tenso-0.6.0-py3-none-any.whl
Upload date: Dec 16, 2025
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87d369871849e6b08989368ccdf012ee56efdd6f4a1395defbfb00bd111f9ba1`
MD5	`409d14d404f6694cf5c18efcd7af168d`
BLAKE2b-256	`01a08abb16295133e11f7af2ce117e135771742da90941f8d3bdc4d28fb9145d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.6.0-py3-none-any.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tenso-0.6.0-py3-none-any.whl
- Subject digest: 87d369871849e6b08989368ccdf012ee56efdd6f4a1395defbfb00bd111f9ba1
- Sigstore transparency entry: 767281713
- Sigstore integration time: Dec 16, 2025
Source repository:
- Permalink: Khushiyant/tenso@8ab1c43f0ca36c816dd313bb253902153405289f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Khushiyant
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8ab1c43f0ca36c816dd313bb253902153405289f
- Trigger Event: push

tenso 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Tenso

Why Tenso?

The Problem

The Solution

Benchmarks

Deserialization Speed (8192×8192 Float32 Matrix)

Stream Reading Performance (95MB Packet)

Network Latency (1KB Tensor over TCP)

Real-World Impact

Installation

Quick Start

Basic Serialization

Network Communication

File I/O with Memory Mapping

Use Cases

Perfect For

Consider Alternatives For

Protocol Design

Header (8 bytes)

Why This Is Fast

Advanced Features

Strict Mode

Packet Introspection

Supported Dtypes

Comparison Table

Performance Deep-Dive

Development

Requirements

Contributing

License

Citation

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance