Skip to main content

High-performance zero-copy tensor protocol

Project description

Tenso Banner

Tenso

Up to 23.8x faster than Apache Arrow. 61x less CPU than SafeTensors.

Zero-copy, SIMD-aligned tensor protocol for high-performance ML infrastructure.

PyPI version Python 3.10+ License: MIT


Why Tenso?

Most serialization formats are designed for general data or disk storage. Tenso is focused on network tensor transmission where every microsecond matters.

The Problem

Traditional formats waste CPU cycles:

  • SafeTensors: 36.7% CPU usage per deserialization (great for disk, overkill for network)
  • Pickle: 41.7% CPU usage + security vulnerabilities
  • Arrow: Fast, but 23.8x slower than Tenso for large tensors

The Solution

Tenso achieves true zero-copy with:

  • Fixed 8-byte header (no JSON parsing overhead)
  • 64-byte memory alignment (SIMD-ready)
  • Direct memory mapping (CPU just points, never copies)

Result: 0.6% CPU usage vs 36.7% for SafeTensors


Benchmarks

System: Python 3.12.9, NumPy 2.3.5, 12 CPU cores, macOS

Deserialization Speed (8192×8192 Float32 Matrix)

Format Time CPU Usage Speedup
Tenso 0.034ms 0.6% 1x
Arrow 0.805ms 1.1% 23.8x slower
SafeTensors 2.621ms 36.7% 77x slower
Pickle 3.293ms 41.7% 97x slower

Stream Reading Performance (95MB Packet)

Method Time Throughput Speedup
Tenso read_stream 21ms 4,500 MB/s 1x
Naive loop 7,870ms 12 MB/s 371x slower

Network Latency (1KB Tensor over TCP)

Metric Value
Throughput 182,940 packets/sec
Latency 5.5 μs/packet

Real-World Impact

Scenario: Inference API serving 10,000 req/sec with 64MB tensors

Format CPU Cores Used Monthly Cost*
SafeTensors 367 cores ~$15,000
Tenso 6 cores ~$245

*Based on typical cloud compute pricing


Installation

pip install tenso

Quick Start

Basic Serialization

import numpy as np
import tenso

# Create tensor
data = np.random.rand(1024, 1024).astype(np.float32)

# Serialize (8.5ms for 64MB)
packet = tenso.dumps(data)

# Deserialize (0.034ms for 64MB)
restored = tenso.loads(packet)

Network Communication

import socket
import tenso

# Server: Receive tensor
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('0.0.0.0', 9999))
server.listen(1)
conn, addr = server.accept()

# Zero-copy read with automatic buffering
tensor = tenso.read_stream(conn)  # Uses readinto() internally

# Client: Send tensor
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(('localhost', 9999))

data = np.random.rand(256, 256).astype(np.float32)
tenso.write_stream(data, client)  # Atomic write with os.writev

File I/O with Memory Mapping

# Write to disk
with open("model_weights.tenso", "wb") as f:
    tenso.dump(large_tensor, f)

# Instant load (no matter the size)
with open("model_weights.tenso", "rb") as f:
    weights = tenso.load(f, mmap_mode=True)  # Memory-mapped, not loaded into RAM

Use Cases

Perfect For

  • Model Serving APIs - 23.8x faster deserialization saves CPU cores
  • Distributed Training - Efficient gradient/activation passing (Ray, Spark)
  • Real-time Robotics - Sub-millisecond latency sensor fusion
  • High-Frequency Trading - Microsecond-precision data exchange
  • Microservices - Fast tensor exchange between services
  • Edge Devices - Minimal dependencies, pure Python

Consider Alternatives For

  • Long-term Model Storage - Use SafeTensors (better ecosystem, HuggingFace integration)
  • Multi-column Dataframes - Use Arrow (designed for tabular data)
  • Arbitrary Python Objects - Use Pickle (if you trust the source)

Protocol Design

Tenso uses a minimalist 4-part structure:

┌─────────────┬──────────────┬──────────────┬────────────────────────┐
│   HEADER    │    SHAPE     │   PADDING    │    BODY (Raw Data)     │
│   8 bytes   │  Variable    │   0-63 bytes │   C-Contiguous Array   │
└─────────────┴──────────────┴──────────────┴────────────────────────┘

Header (8 bytes)

[4 bytes: Magic "TNSO"]
[1 byte:  Protocol Version (2)]
[1 byte:  Flags (alignment, etc.)]
[1 byte:  Dtype Code]
[1 byte:  Number of Dimensions]

Why This Is Fast

SafeTensors: Uses JSON header - 3.67ms parsing overhead
Arrow: Complex IPC format with schema validation - 0.805ms overhead
Tenso: Fixed 8-byte struct - 0.034ms (just unpack and memory map)

The padding ensures the data body starts at a 64-byte boundary, enabling:

  • AVX-512 vectorization
  • Zero-copy memory mapping
  • Cache-line alignment

Advanced Features

Strict Mode

Prevents accidental memory copies:

# Force C-contiguous check
try:
    packet = tenso.dumps(fortran_array, strict=True)
except ValueError:
    print("Array must be C-contiguous!")
    fortran_array = np.ascontiguousarray(fortran_array)

Packet Introspection

Inspect metadata without deserializing:

info = tenso.get_packet_info(packet)
print(f"Shape: {info['shape']}")
print(f"Dtype: {info['dtype']}")
print(f"Size: {info['data_size_bytes']} bytes")

Supported Dtypes

All NumPy numeric types including:

  • Floats: float16, float32, float64
  • Integers: int8, int16, int32, int64, uint8, uint16, uint32, uint64
  • Complex: complex64, complex128
  • Boolean: bool

Comparison Table

Feature Tenso Arrow SafeTensors Pickle
Deserialize Speed (64MB) 0.034ms 0.805ms 2.621ms 3.293ms
CPU Usage 0.6% 1.1% 36.7% 41.7%
Memory Overhead 0.00% 0.00% 0.00% 0.00%
Security Safe Safe Safe RCE Risk
Dependencies NumPy only PyArrow (large) Rust bindings Python stdlib
Best For Network/IPC Dataframes Disk storage Python objects
SIMD Aligned 64-byte 64-byte No No

Performance Deep-Dive

Read the full story: Breaking the Speed Limit: Optimizing Python Tensor Serialization to 5 GB/s

Key insights:

  • Why JSON headers kill performance
  • How memory alignment enables zero-copy
  • Why Tenso beats Arrow for single tensors
  • Real-world cost savings ($15k/month at scale)

Development

# Clone repository
git clone https://github.com/Khushiyant/tenso.git
cd tenso

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run comprehensive benchmarks
python benchmark.py all

# Quick benchmark (serialization + Arrow comparison)
python benchmark.py quick

Requirements

  • Python >= 3.10
  • NumPy >= 1.20

Optional (for benchmarks):

  • pyarrow - Compare with Apache Arrow
  • safetensors - Compare with SafeTensors
  • msgpack - Compare with MessagePack
  • psutil - Monitor CPU/memory usage

Contributing

Contributions welcome. Areas we'd love help with:

  • Async support (async def aread_stream())
  • Compression integration (zstd, lz4)
  • gRPC/FastAPI integration examples
  • Rust bindings for even faster serialization
  • JavaScript/WASM client for browser ML
  • CUDA support for GPU-direct transfers

License

MIT License - see LICENSE file.


Citation

If you use Tenso in research, please cite:

@software{tenso2025,
  author = {Khushiyant},
  title = {Tenso: High-Performance Zero-Copy Tensor Protocol},
  year = {2025},
  url = {https://github.com/Khushiyant/tenso}
}

Acknowledgments

Inspired by the need for faster ML inference infrastructure. Built with care for the ML community.

Star this repo if Tenso saved you CPU cycles.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tenso-0.6.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tenso-0.6.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file tenso-0.6.0.tar.gz.

File metadata

  • Download URL: tenso-0.6.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.6.0.tar.gz
Algorithm Hash digest
SHA256 9d90151eeb72eb2087548afb08cb0320336d0c2bf0ecd0cfde58fd5ec9432e27
MD5 3eeb25279e98de9a1c3be99fde2de093
BLAKE2b-256 d2c6f81522a284dc5078e9652c24fd72e135cceb45f74ee1670d469cd755fa85

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.6.0.tar.gz:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tenso-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: tenso-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tenso-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87d369871849e6b08989368ccdf012ee56efdd6f4a1395defbfb00bd111f9ba1
MD5 409d14d404f6694cf5c18efcd7af168d
BLAKE2b-256 01a08abb16295133e11f7af2ce117e135771742da90941f8d3bdc4d28fb9145d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tenso-0.6.0-py3-none-any.whl:

Publisher: release.yml on Khushiyant/tenso

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page