Add your description here

Project description

onnx-dl

ONNX Dynamic Loading - A transparent proxy wrapper for onnxruntime.InferenceSession that provides automatic model lifecycle management based on configurable time-to-live (TTL) policies.

The "dl" stands for dynamic loading, referring to the proxy's ability to automatically load and unload ONNX models from memory based on usage patterns, helping optimize memory usage in applications with multiple models or varying workload patterns.

Features

Transparent API: Drop-in replacement for onnxruntime.InferenceSession - no code changes needed in your model classes
Flexible TTL Policies: Configure model lifecycle with three modes:
- ttl=-1: Persistent (model stays loaded indefinitely)
- ttl=0: Immediate unload (minimal memory footprint)
- ttl>0: Timed unload (auto-cleanup after N seconds of inactivity)
Smart TTL Reset: Automatically resets timer on each access, preventing unloading of actively used models
Context Manager Support: Explicit lifecycle control for batch processing with proxy.session()
Reference Counting: Properly handles nested context managers
Thread-Safe: Safe concurrent access from multiple threads
Lazy Loading: Models load on first use, not at construction time
Shared Proxies: Multiple instances can share a single proxy for memory efficiency
Lifecycle Monitoring: Built-in logging for tracking load/unload events

Installation

Using uv

uv add onnx-dl

Using pip

pip install onnx-dl

Usage

Quick Start

from onnx_dl import InferenceSessionProxy
import numpy as np

# Create a proxy with 60-second TTL
proxy = InferenceSessionProxy("model.onnx", ttl=60)

# Use exactly like onnxruntime.InferenceSession
inputs = proxy.get_inputs()
outputs = proxy.get_outputs()

# Run inference
result = proxy.run(
    [outputs[0].name],
    {inputs[0].name: your_data}
)

# Model automatically unloads after 60 seconds of inactivity

Different TTL Strategies

# Frequently used model - keep in memory
vad_proxy = InferenceSessionProxy("vad.onnx", ttl=-1)

# Occasionally used model - auto-unload when idle
seg_proxy = InferenceSessionProxy("segmentation.onnx", ttl=60)

# Rarely used model - unload immediately
emb_proxy = InferenceSessionProxy("embedding.onnx", ttl=0)

Context Manager for Batch Processing

proxy = InferenceSessionProxy("model.onnx", ttl=30)

# Keep model loaded for entire batch
with proxy.session() as sess:
    for data in batch:
        result = sess.run([output_name], {input_name: data})
        process(result)
# TTL timer starts after exiting context

Drop-in Replacement

class EmbeddingModel:
    def __init__(self, session):
        """Works with both InferenceSession and InferenceSessionProxy"""
        self.session = session

    def extract(self, segments):
        input_name = self.session.get_inputs()[0].name
        output_name = self.session.get_outputs()[0].name
        result = self.session.run([output_name], {input_name: segments})
        return result[0]

# Use with proxy for automatic lifecycle management
proxy = InferenceSessionProxy("model.onnx", ttl=60)
model = EmbeddingModel(proxy)

Shared Proxy Pattern

# Share a single proxy across multiple instances for memory efficiency
shared_proxy = InferenceSessionProxy("embedding.onnx", ttl=300)

processor1 = AudioProcessor(shared_proxy)
processor2 = AudioProcessor(shared_proxy)
processor3 = AudioProcessor(shared_proxy)

# All processors use the same model instance in memory

API Reference

Constructor

InferenceSessionProxy(
    model_path: str | Path,
    sess_options: ort.SessionOptions | None = None,
    providers: list[str] | None = None,
    ttl: int = -1
)

Parameters:

model_path: Path to the ONNX model file
sess_options: ONNX Runtime session options (optional)
providers: List of execution providers (optional)
ttl: Time-to-live in seconds (-1 = persistent, 0 = immediate, >0 = seconds)

Methods

run(output_names, input_feed, run_options=None) - Run inference
get_inputs() - Get input metadata
get_outputs() - Get output metadata
get_providers() - Get execution providers
session() - Context manager for explicit lifecycle control (use with with statement)
unload() - Manually unload the model immediately
is_loaded - Property to check if model is currently loaded

All other onnxruntime.InferenceSession methods are automatically proxied.

More Examples

See the examples/ directory for comprehensive examples:

01_basic_usage.py - TTL modes and basic operations
02_context_manager.py - Context manager patterns
03_drop_in_replacement.py - Using as transparent replacement
04_shared_proxy.py - Sharing proxies across instances

Contributing

Bug reports, feature requests, and contributions are welcome! Please open an issue or pull request on the project repository.

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Nov 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onnx_dl-0.1.0.tar.gz (6.3 kB view details)

Uploaded Nov 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

onnx_dl-0.1.0-py3-none-any.whl (7.3 kB view details)

Uploaded Nov 21, 2025 Python 3

File details

Details for the file onnx_dl-0.1.0.tar.gz.

File metadata

Download URL: onnx_dl-0.1.0.tar.gz
Upload date: Nov 21, 2025
Size: 6.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.23

File hashes

Hashes for onnx_dl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f704477a927c8e877bece840fc30e1f256489be22523285da6e982a8537a971b`
MD5	`0a760cb7efc55df55e773cace1910e80`
BLAKE2b-256	`ae7d9e9d7a17865c4f36e6e78bb40457f0b83c099a4baf0b0c6219f3d76fc055`

See more details on using hashes here.

File details

Details for the file onnx_dl-0.1.0-py3-none-any.whl.

File metadata

Download URL: onnx_dl-0.1.0-py3-none-any.whl
Upload date: Nov 21, 2025
Size: 7.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.23

File hashes

Hashes for onnx_dl-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41362298231c8d3daaa70e75159509fdb8f69c7e022d1d8fd32db7a492deb1fe`
MD5	`763f62a17f99f8451026b0dcee4c1045`
BLAKE2b-256	`3b144ebec13075d24ba4f2d5ae1c2ad0bd62e56c90c5ef474d5357aa2c79f761`

See more details on using hashes here.

onnx-dl 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

onnx-dl

Features

Installation

Using uv

Using pip

Usage

Quick Start

Different TTL Strategies

Context Manager for Batch Processing

Drop-in Replacement

Shared Proxy Pattern

API Reference

Constructor

Methods

More Examples

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes