Add your description here
Project description
onnx-dl
ONNX Dynamic Loading - A transparent proxy wrapper for onnxruntime.InferenceSession that provides automatic model lifecycle management based on configurable time-to-live (TTL) policies.
The "dl" stands for dynamic loading, referring to the proxy's ability to automatically load and unload ONNX models from memory based on usage patterns, helping optimize memory usage in applications with multiple models or varying workload patterns.
Features
- Transparent API: Drop-in replacement for
onnxruntime.InferenceSession- no code changes needed in your model classes - Flexible TTL Policies: Configure model lifecycle with three modes:
ttl=-1: Persistent (model stays loaded indefinitely)ttl=0: Immediate unload (minimal memory footprint)ttl>0: Timed unload (auto-cleanup after N seconds of inactivity)
- Smart TTL Reset: Automatically resets timer on each access, preventing unloading of actively used models
- Context Manager Support: Explicit lifecycle control for batch processing with
proxy.session() - Reference Counting: Properly handles nested context managers
- Thread-Safe: Safe concurrent access from multiple threads
- Lazy Loading: Models load on first use, not at construction time
- Shared Proxies: Multiple instances can share a single proxy for memory efficiency
- Lifecycle Monitoring: Built-in logging for tracking load/unload events
Installation
Using uv
uv add onnx-dl
Using pip
pip install onnx-dl
Usage
Quick Start
from onnx_dl import InferenceSessionProxy
import numpy as np
# Create a proxy with 60-second TTL
proxy = InferenceSessionProxy("model.onnx", ttl=60)
# Use exactly like onnxruntime.InferenceSession
inputs = proxy.get_inputs()
outputs = proxy.get_outputs()
# Run inference
result = proxy.run(
[outputs[0].name],
{inputs[0].name: your_data}
)
# Model automatically unloads after 60 seconds of inactivity
Different TTL Strategies
# Frequently used model - keep in memory
vad_proxy = InferenceSessionProxy("vad.onnx", ttl=-1)
# Occasionally used model - auto-unload when idle
seg_proxy = InferenceSessionProxy("segmentation.onnx", ttl=60)
# Rarely used model - unload immediately
emb_proxy = InferenceSessionProxy("embedding.onnx", ttl=0)
Context Manager for Batch Processing
proxy = InferenceSessionProxy("model.onnx", ttl=30)
# Keep model loaded for entire batch
with proxy.session() as sess:
for data in batch:
result = sess.run([output_name], {input_name: data})
process(result)
# TTL timer starts after exiting context
Drop-in Replacement
class EmbeddingModel:
def __init__(self, session):
"""Works with both InferenceSession and InferenceSessionProxy"""
self.session = session
def extract(self, segments):
input_name = self.session.get_inputs()[0].name
output_name = self.session.get_outputs()[0].name
result = self.session.run([output_name], {input_name: segments})
return result[0]
# Use with proxy for automatic lifecycle management
proxy = InferenceSessionProxy("model.onnx", ttl=60)
model = EmbeddingModel(proxy)
Shared Proxy Pattern
# Share a single proxy across multiple instances for memory efficiency
shared_proxy = InferenceSessionProxy("embedding.onnx", ttl=300)
processor1 = AudioProcessor(shared_proxy)
processor2 = AudioProcessor(shared_proxy)
processor3 = AudioProcessor(shared_proxy)
# All processors use the same model instance in memory
API Reference
Constructor
InferenceSessionProxy(
model_path: str | Path,
sess_options: ort.SessionOptions | None = None,
providers: list[str] | None = None,
ttl: int = -1
)
Parameters:
model_path: Path to the ONNX model filesess_options: ONNX Runtime session options (optional)providers: List of execution providers (optional)ttl: Time-to-live in seconds (-1 = persistent, 0 = immediate, >0 = seconds)
Methods
run(output_names, input_feed, run_options=None)- Run inferenceget_inputs()- Get input metadataget_outputs()- Get output metadataget_providers()- Get execution providerssession()- Context manager for explicit lifecycle control (use withwithstatement)unload()- Manually unload the model immediatelyis_loaded- Property to check if model is currently loaded
All other onnxruntime.InferenceSession methods are automatically proxied.
More Examples
See the examples/ directory for comprehensive examples:
- 01_basic_usage.py - TTL modes and basic operations
- 02_context_manager.py - Context manager patterns
- 03_drop_in_replacement.py - Using as transparent replacement
- 04_shared_proxy.py - Sharing proxies across instances
Contributing
Bug reports, feature requests, and contributions are welcome! Please open an issue or pull request on the project repository.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file onnx_dl-0.1.0.tar.gz.
File metadata
- Download URL: onnx_dl-0.1.0.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f704477a927c8e877bece840fc30e1f256489be22523285da6e982a8537a971b
|
|
| MD5 |
0a760cb7efc55df55e773cace1910e80
|
|
| BLAKE2b-256 |
ae7d9e9d7a17865c4f36e6e78bb40457f0b83c099a4baf0b0c6219f3d76fc055
|
File details
Details for the file onnx_dl-0.1.0-py3-none-any.whl.
File metadata
- Download URL: onnx_dl-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41362298231c8d3daaa70e75159509fdb8f69c7e022d1d8fd32db7a492deb1fe
|
|
| MD5 |
763f62a17f99f8451026b0dcee4c1045
|
|
| BLAKE2b-256 |
3b144ebec13075d24ba4f2d5ae1c2ad0bd62e56c90c5ef474d5357aa2c79f761
|