Skip to main content

A lightweight bridge to serve synchronous model inference via async FastAPI.

Project description

LightInfer

LightInfer is a lightweight, high-performance bridge for serving synchronous model inference code (PyTorch, TensorFlow, etc.) via an asynchronous FastAPI server.

It solves the "Blocking Loop" problem by efficiently isolating heavy computation in dedicated worker threads while maintaining a fully asynchronous, high-concurrency web frontend.

Features

  • Zero-Blocking Architecture: Async Web Frontend + Sync Worker Threads.
  • Efficient Bridge: Uses AsyncResponseBridge for zero-thread-overhead waiting.
  • Streaming Support:
    • Native Server-Sent Events (SSE) for text streaming.
    • Binary Streaming for audio/video generation (with chunk buffering).
  • Easy Integration: Wrap any Python class with an infer method.
  • Context Isolation: Each worker runs in its own thread, ensuring safety for libraries like PyTorch.

Installation

pip install lightinfer

Quick Start

1. Define your Model

LightInfer wraps any class with an infer method. The arguments to infer are automatically mapped from the JSON request.

import time

class MyModel:
    def infer(self, prompt: str = "world"):
        # Simulate heavy work
        time.sleep(1)
        return {"message": f"Hello, {prompt}!"}

2. Start the Server

from lightinfer.server import LightServer

# Create your model instance
model = MyModel()

# Start server (you can pass a list of models to run multiple worker threads)
server = LightServer([model])
server.start(port=8000)

3. Make Requests

Standard Request:

import requests

# 'args' in JSON maps to positional arguments of infer()
# 'kwargs' in JSON maps to keyword arguments of infer()
resp = requests.post("http://localhost:8000/api/v1/infer", 
                     json={"args": ["LightInfer"]})
print(resp.json())
# Output: {'message': 'Hello, LightInfer!'}

Streaming Request:

If your model returns a generator, you can use streaming:

class StreamingModel:
    def infer(self, prompt: str):
        yield "Part 1"
        time.sleep(0.5)
        yield "Part 2"

Client side:

resp = requests.post("http://localhost:8000/api/v1/infer", 
                     json={"args": ["test"], "stream": True}, stream=True)

for line in resp.iter_lines():
    if line:
        print(line.decode('utf-8'))

Examples

Check the examples/ directory for ready-to-run scenarios:

  • Simple LLM: Text-to-Text generation with SSE streaming.
  • Streaming TTS: Text-to-Audio generation with binary chunk streaming.

CLI Usage

You can serve any model class directly from the terminal.

Format: lightinfer <module>:<Class>

Given a file my_model.py:

class MyModel:
    def infer(self, prompt: str):
        return f"Echo: {prompt}"

Run:

lightinfer my_model:MyModel --port 8000 --workers 2

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightinfer-0.1.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lightinfer-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file lightinfer-0.1.0.tar.gz.

File metadata

  • Download URL: lightinfer-0.1.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for lightinfer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe1916b9f94c701729004b5477f9e26f2eaeb69b6f5fd99d885d1e70abd1aa32
MD5 a81613fa3275d92bcd8338d0dadb3b2d
BLAKE2b-256 9d19389f0cd82486881adf641482173df38a66f12c1227a2722f8366ba68afa4

See more details on using hashes here.

File details

Details for the file lightinfer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lightinfer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for lightinfer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2daa94f9b9b465b2885ad48bf33b3e39525e5ff7d0a37ce90a69c45aec64e34
MD5 910a7fe65f64f4bd95431e533e4073c8
BLAKE2b-256 147389874283dbd10038fa2d19b9c33827541bd68b628f1ac7e7e967e8d41ac9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page