Skip to main content

Universal AI runtime for local and remote inference.

Project description

CI

ai-track logo

ai-track is a universal AI runtime library for local and remote inference. It chooses the best available execution tier automatically, keeps the core package lightweight, and exposes an OpenAI-style client surface so application code can stay backend-agnostic.

What it does

  • Routes requests through local or remote inference automatically.
  • Supports macOS MLX backends for on-device inference.
  • Supports CUDA backends for GPU inference with vLLM and Hugging Face models.
  • Falls back to a remote OpenAI-compatible client when no local backend fits.
  • Exposes a familiar client surface for chat, embeddings, images, audio, and transcription.

Architecture

The codebase is split into two major layers:

  • track.inference contains the runtime primitives and backend implementations.
  • track.hub contains the public routing layer that decides whether a model should use local inference or a remote client.

The runtime is centered around LocalAI, which can manage:

  • chat generation
  • embeddings
  • image generation
  • text-to-speech
  • speech-to-text transcription

Runtime selection

The runtime chooses a backend automatically when you do not pass one explicitly:

  • macOS resolves to the MLX backend
  • CUDA-capable Linux systems resolve to the CUDA backend
  • everything else stays available through the remote OpenAI-compatible path

You can still force a backend explicitly when you need to.

Local-first routing

Routing is local-first:

  1. The hub checks whether the selected model is local.
  2. If the runtime can serve it locally, the request stays on-device.
  3. Otherwise the hub falls back to a remote OpenAI-compatible client.

This keeps local inference fast and private when available while preserving a reliable remote fallback.

Public API

The main entrypoints are:

from track.hub import Hub
from track.inference import LocalAI

LocalAI exposes the local runtime directly and can also return an OpenAI-style client.

Hub resolves a final client for a selected model and is the preferred way to route requests from application code.

OpenAI-style client

The local compatibility layer mirrors the shape of the OpenAI Python client. It supports:

  • client.chat.completions.create(...)
  • client.embeddings.create(...)
  • client.images.generate(...)
  • client.audio.speech.create(...)
  • client.audio.transcriptions.create(...)

Example: chat

from track.hub import Hub
from track.inference import AiModel, InferenceConfig, LocalAI

chat_model = AiModel(
    default=True,
    location="local",
    type="llm",
    status="available",
    model="mlx-community/qwen2",
    alias="Qwen2",
    inference_config=InferenceConfig(max_tokens=256, temperature=0.2),
)

runtime = LocalAI(
    chat_config=chat_model,
    remote_api_key="sk-example",
    remote_base_url="https://openrouter.ai/api/v1",
)

hub = Hub(local_ai=runtime)
client = hub.get_client(chat_model)

response = client.chat.completions.create(
    model=chat_model.model,
    messages=[
        {"role": "user", "content": "Summarize this architecture."},
    ],
)

print(response.choices[0].message["content"])

Example: transcription

from track.inference import LocalAI, TranscriptionModelConfig

runtime = LocalAI(
    backend="cuda",
    transcription_config=TranscriptionModelConfig(
        model_id="openai/whisper-small",
        alias="Whisper Small",
    ),
)

result = runtime.transcribe("sample.wav")
print(result.text)

Example: OpenAI-style transcription

client = runtime.get_client()
result = client.audio.transcriptions.create(
    model="openai/whisper-small",
    file="sample.wav",
)
print(result.text)

Installation

The core package is intentionally small and works without the optional local backends.

Core install

uv sync

If you want to install from PyPI with pip, use:

pip install ai-track

macOS MLX extras

uv sync --extra macos

For pip:

pip install "ai-track[macos]"

CUDA extras

uv sync --extra cuda

For pip:

pip install "ai-track[cuda]"

The CUDA extra brings in the GPU-oriented runtime stack, including vLLM, Transformers, Diffusers, and PyTorch-based helpers.

Testing

Run the full unit suite with:

uv run pytest -q tests

The tests focus on:

  • hub routing decisions
  • backend selection
  • OpenAI-style client compatibility
  • multimodal cleanup behavior
  • transcription support
  • CUDA factory selection

Development notes

  • Prefer track.hub for routing decisions.
  • Keep optional imports lazy so the core package stays importable without MLX or CUDA dependencies.
  • Add docstrings and type hints to new helpers and edited functions.
  • Reuse shared helpers where both MLX and CUDA backends need the same logic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_track-0.1.0.tar.gz (180.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_track-0.1.0-py3-none-any.whl (50.7 kB view details)

Uploaded Python 3

File details

Details for the file ai_track-0.1.0.tar.gz.

File metadata

  • Download URL: ai_track-0.1.0.tar.gz
  • Upload date:
  • Size: 180.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_track-0.1.0.tar.gz
Algorithm Hash digest
SHA256 63ac82b097f40a1d702492814e3db2495ecd8e5254a7ce518e67243de16e9b50
MD5 dc0f24e25322584d7490ffa049bb13c2
BLAKE2b-256 b6dc5df1a90f31289aae6587d0cd60e8bcc4aef1ade25caf8bf99d66ac27b2a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_track-0.1.0.tar.gz:

Publisher: publish.yml on langelabs/ai-track

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_track-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ai_track-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_track-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33d9f42de7db9a5ff3d0990b8f55a1c70b415d0df7940b12e6289cddf1fd465c
MD5 6f8fca2c634970dea78470067749264c
BLAKE2b-256 06f36819da70e39c88a46d68e6e5eab46ae4541c24c1b13d89442180e93fa279

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_track-0.1.0-py3-none-any.whl:

Publisher: publish.yml on langelabs/ai-track

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page