A lightweight bridge to serve synchronous model inference via async FastAPI.
Project description
LightInfer
LightInfer is a lightweight, high-performance bridge for serving synchronous model inference code (PyTorch, TensorFlow, etc.) via an asynchronous FastAPI server.
It solves the "Blocking Loop" problem by efficiently isolating heavy computation in dedicated worker threads while maintaining a fully asynchronous, high-concurrency web frontend.
Features
- Zero-Blocking Architecture: Async Web Frontend + Sync Worker Threads.
- Efficient Bridge: Uses
AsyncResponseBridgefor zero-thread-overhead waiting. - Streaming Support:
- Native Server-Sent Events (SSE) for text streaming.
- Binary Streaming for audio/video generation (with chunk buffering).
- Easy Integration: Wrap any Python class with an
infermethod. - Context Isolation: Each worker runs in its own thread, ensuring safety for libraries like PyTorch.
Installation
pip install lightinfer
Quick Start
1. Define your Model
LightInfer wraps any class with an infer method. The arguments to infer are automatically mapped from the JSON request.
import time
class MyModel:
def infer(self, prompt: str = "world"):
# Simulate heavy work
time.sleep(1)
return {"message": f"Hello, {prompt}!"}
2. Start the Server
from lightinfer.server import LightServer
# Create your model instance
model = MyModel()
# Start server (you can pass a list of models to run multiple worker threads)
server = LightServer([model])
server.start(port=8000)
3. Make Requests
Standard Request:
import requests
# 'args' in JSON maps to positional arguments of infer()
# 'kwargs' in JSON maps to keyword arguments of infer()
resp = requests.post("http://localhost:8000/api/v1/infer",
json={"args": ["LightInfer"]})
print(resp.json())
# Output: {'message': 'Hello, LightInfer!'}
Streaming Request:
If your model returns a generator, you can use streaming:
class StreamingModel:
def infer(self, prompt: str):
yield "Part 1"
time.sleep(0.5)
yield "Part 2"
Client side:
resp = requests.post("http://localhost:8000/api/v1/infer",
json={"args": ["test"], "stream": True}, stream=True)
for line in resp.iter_lines():
if line:
print(line.decode('utf-8'))
Examples
Check the examples/ directory for ready-to-run scenarios:
- Simple LLM: Text-to-Text generation with SSE streaming.
- Streaming TTS: Text-to-Audio generation with binary chunk streaming.
CLI Usage
You can serve any model class directly from the terminal.
Format: lightinfer <module>:<Class>
Given a file my_model.py:
class MyModel:
def infer(self, prompt: str):
return f"Echo: {prompt}"
Run:
lightinfer my_model:MyModel --port 8000 --workers 2
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lightinfer-0.1.0.tar.gz.
File metadata
- Download URL: lightinfer-0.1.0.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe1916b9f94c701729004b5477f9e26f2eaeb69b6f5fd99d885d1e70abd1aa32
|
|
| MD5 |
a81613fa3275d92bcd8338d0dadb3b2d
|
|
| BLAKE2b-256 |
9d19389f0cd82486881adf641482173df38a66f12c1227a2722f8366ba68afa4
|
File details
Details for the file lightinfer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lightinfer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2daa94f9b9b465b2885ad48bf33b3e39525e5ff7d0a37ce90a69c45aec64e34
|
|
| MD5 |
910a7fe65f64f4bd95431e533e4073c8
|
|
| BLAKE2b-256 |
147389874283dbd10038fa2d19b9c33827541bd68b628f1ac7e7e967e8d41ac9
|