Heterogeneous compute router — auto-detect CUDA, iGPU, CPU, NPU and route ML workloads optimally

These details have not been verified by PyPI

Project description

device-router

Heterogeneous compute router — auto-detect CUDA, iGPU, CPU, NPU and route ML workloads optimally.

Modern laptops and workstations have multiple compute units: a discrete GPU (CUDA), an integrated GPU (iGPU/DirectML), a Neural Processing Unit (NPU), and the CPU. Most ML frameworks pick one device and stick with it. That's wasteful.

device-router detects what's available and routes each workload to the best device automatically.

Why it matters

Workload	Best device	Why
Single embedding	CPU	No GPU transfer overhead (~9μs)
Small model (int8)	CPU (VNNI)	CPU has dedicated VNNI instructions
Medium model batched	iGPU	Good compute, low power
Large model training	CUDA GPU	Parallelism + AMP
ONNX inference	CPU	ONNX Runtime is CPU-optimized

Install

pip install device-router

Optional dependencies:

pip install device-router[cuda]      # CUDA GPU detection via torch
pip install device-router[directml]  # iGPU detection via torch-directml
pip install device-router[all]       # Everything
pip install device-router[dev]       # pytest + numpy for development

Quick start

from device_router import DeviceRouter, RoutingStrategy

router = DeviceRouter()
router.detect()  # Finds CUDA, DirectML, CPU features, NPU

# Route a workload
decision = router.route(
    model_size=1_000_000,  # parameters
    batch_size=32,
    precision="fp32",      # or "fp16", "bf16", "int8"
    strategy=RoutingStrategy.AUTO,
)
print(f"Use {decision.device} ({decision.reason})")
# → Use cuda (Medium/large model (1,000,000 params) — GPU recommended)

# System overview
overview = router.overview()
# Returns: {cuda: {...}, cpu: {...}, igpu: {...}, npu: {...}}

Routing strategies

Strategy	Description	Use case
`AUTO`	Best guess based on model size & batch	Default
`LATENCY`	Optimize for single-sample speed	Real-time inference
`THROUGHPUT`	Optimize for batch processing	Batch jobs
`POWER`	Prefer CPU/iGPU for efficiency	Laptops, mobile

How it works

Without any dependencies

device-router runs pure CPU detection:

CPU architecture, core count, frequency
Instruction set features (AVX, AVX2, AVX-512, VNNI, AMX, NEON, SSE4)
This is enough to route small models optimally

With `torch` installed

Adds CUDA detection:

GPU count, name, VRAM, compute capability
CUDA/cuDNN version
Enables AMP and GPU benchmarking

With `torch-directml` installed

Adds iGPU detection:

DirectML device availability
Enables iGPU offloading for medium workloads

Routing decision logic

ONNX model → CPU (always)
Training → CUDA (if available) or CPU
Small model (<100K params) → CPU
  + int8 + VNNI → CPU with VNNI optimization
Medium model (100K-10M) → CUDA > DirectML > CPU
Large model (>10M) + batched → CUDA with AMP

API

`DeviceRouter`

router = DeviceRouter()
router.detect()                    # Scan for devices
router.overview()                   # Get system overview
router.route(model_size, batch_size, precision, strategy)  # Route workload
router.assign("cuda")               # Get torch.device for device string

`RoutingDecision`

decision.device       # "cuda", "cpu", "directml", "npu"
decision.reason       # Human-readable explanation
decision.precision    # Recommended precision
decision.use_amp      # Whether to use mixed precision
decision.confidence   # Confidence (0-1)

SuperInstance Mesh integration

# entry_point: superinstance.plugins
def register_device_router(registry):
    from device_router import DeviceRouter
    registry.register("devices", "router", DeviceRouter)

Running tests

pip install -e ".[dev]"
pytest tests/ -v

License

MIT

Ecosystem

Part of the SuperInstance ecosystem:

Package	Description
plato-core	Base types + mesh registry
tensor-spline	SplineLinear neural compression
eisenstein-embed	5-layer matching cascade
plato-training	Training monolith
device-router	Heterogeneous compute routing
triplet-miner	Git-powered contrastive data
micro-onnx	ONNX export + benchmark

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

device_router-0.1.0.tar.gz (14.0 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

device_router-0.1.0-py3-none-any.whl (11.8 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file device_router-0.1.0.tar.gz.

File metadata

Download URL: device_router-0.1.0.tar.gz
Upload date: May 23, 2026
Size: 14.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for device_router-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dfea39391aeefdfe6814293ac45115dd65ba336f2c1a255939396c6230730fce`
MD5	`a5db98c5b5e825a49f2d522c6a20d1e9`
BLAKE2b-256	`e5f8013369fa357a576cb8ab7bff4cbaf6e53162dcc9d7fcad4beb272db645d4`

See more details on using hashes here.

File details

Details for the file device_router-0.1.0-py3-none-any.whl.

File metadata

Download URL: device_router-0.1.0-py3-none-any.whl
Upload date: May 23, 2026
Size: 11.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for device_router-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a08fbb97a4e6c025b0a32df275cf58ac3cfff59549a4de8f3481c2919f281ef`
MD5	`1533102de872e05531cca44a6e02086f`
BLAKE2b-256	`ec2cd5e040f7b767340e00653c784848dbb88586e3a25d90c869eedbfee9b7d2`

See more details on using hashes here.

device-router 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

device-router

Why it matters

Install

Quick start

Routing strategies

How it works

Without any dependencies

With torch installed

With torch-directml installed

Routing decision logic

API

DeviceRouter

RoutingDecision

SuperInstance Mesh integration

Running tests

License

Ecosystem

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

With `torch` installed

With `torch-directml` installed

`DeviceRouter`

`RoutingDecision`