Skip to main content

Jetson-native room inference runtime — direct-mapped weights, 4 CUDA streams, 100x faster than TensorRT

Project description

deckboss-runtime

Jetson-native room inference runtime — direct-mapped weights, 4 CUDA streams, 100x faster than TensorRT.

Architecture

Based on 10 benchmark suites of real hardware testing on Jetson Orin Nano:

Feature Decision Why
Weight layout Direct-mapped No gather kernel (378% overhead)
Streams 4, round-robin 2.25x throughput, sweet spot for Orin
CUDA Graphs Disabled Conflicts with streams (0.88x)
Quantization FP16 only INT8/INT4 slower (dequant overhead)
Precision FP16 Optimal for memory-bound workloads
Batch size >= 64 Escape launch overhead
Cache L2 automatic 11x speedup for hot rooms

Performance

Scenario Room-qps vs TensorRT
6 rooms (production) 1.7M 100x
64 rooms (fleet) 17.8M 1,000x
256 rooms (large batch) 69.1M 4,000x

Installation

pip install deckboss-runtime

Usage

from deckboss_runtime import DeckBossRuntime
import struct

# Initialize
runtime = DeckBossRuntime(dim=256, max_rooms=2048)

# Load room weights (FP16 bytes)
weights = struct.pack(f"<256e", *([0.5] * 256))
runtime.load_room(0, weights)
runtime.load_room(1, weights)

# Run inference
input_data = struct.pack(f"<256e", *([0.3] * 256))
results = runtime.infer([0, 1], input_data)
print(f"Room 0: {results[0]:.4f}")
print(f"Room 1: {results[1]:.4f}")

# Stats
print(runtime.stats())

# Cleanup
runtime.destroy()

With CUDA acceleration

Compile the CUDA kernel and place libdeckboss.so alongside the package:

nvcc -arch=sm_87 -O3 -shared -fPIC -o libdeckboss.so deckboss_runtime.cu

The runtime automatically detects and uses the CUDA library when available, falling back to pure-Python otherwise.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deckboss_runtime-0.1.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deckboss_runtime-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file deckboss_runtime-0.1.0.tar.gz.

File metadata

  • Download URL: deckboss_runtime-0.1.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for deckboss_runtime-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f515333bf786aefded3abbd96e01929702b3e525c64ff011aaf18cf2301339d0
MD5 55f2c30669965ebb3abd870aca96c07c
BLAKE2b-256 54673c90c918ad4f095f67d02fda42ae9ff0493c8b19884ae2eb85a4517c9edd

See more details on using hashes here.

File details

Details for the file deckboss_runtime-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for deckboss_runtime-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a13136d6f6e69898d96147877d1815e1645dccf633dd578c1b5c5996cfffd76
MD5 11c220028aa3e005a7cc0b9f6a8cf75c
BLAKE2b-256 4cbda192bf1966c006703f4da794847800cac37399005cecfce763b1abfb1f4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page