Universal AI runtime for local and remote inference.
Project description
ai-track is a universal AI runtime library for local and remote inference.
It chooses the best available execution tier automatically, keeps the core
package lightweight, and exposes an OpenAI-style client surface so application
code can stay backend-agnostic.
What it does
- Routes requests through local or remote inference automatically.
- Supports macOS MLX backends for on-device inference.
- Supports CUDA backends for GPU inference with vLLM and Hugging Face models.
- Falls back to a remote OpenAI-compatible client when no local backend fits.
- Exposes a familiar client surface for chat, embeddings, images, audio, and transcription.
Architecture
The codebase is split into two major layers:
track.inferencecontains the runtime primitives and backend implementations.track.hubcontains the public routing layer that decides whether a model should use local inference or a remote client.track.contractscontains shared dataclasses, protocols, and base interfaces.track.utilscontains shared helper functions for devices, storage, audio, chat message handling, and transcription input prep.
The runtime is centered around LocalAI, which can manage:
- chat generation
- embeddings
- image generation
- text-to-speech
- speech-to-text transcription
Runtime selection
The runtime chooses a backend automatically when you do not pass one explicitly:
- macOS resolves to the MLX backend
- CUDA-capable Linux systems resolve to the CUDA backend
- everything else stays available through the remote OpenAI-compatible path
You can still force a backend explicitly when you need to.
Local-first routing
Routing is local-first:
- The hub checks whether the selected model is local.
- If the runtime can serve it locally, the request stays on-device.
- Otherwise the hub falls back to a remote OpenAI-compatible client.
This keeps local inference fast and private when available while preserving a reliable remote fallback.
Public API
The main entrypoints are:
from track import hub, inference
from track.hub import AiHub
from track.inference import LocalAI
LocalAI exposes the local runtime directly and can also return an
OpenAI-style client.
Hub resolves a final client for a selected model and is the preferred way to
route requests from application code.
OpenAI-style client
The local compatibility layer mirrors the shape of the OpenAI Python client. It supports:
client.chat.completions.create(...)client.embeddings.create(...)client.images.generate(...)client.audio.speech.create(...)client.audio.transcriptions.create(...)
Example: chat
from track.hub import AiHub
from track.inference import AiModel, InferenceConfig, LocalAI
chat_model = AiModel(
default=True,
location="local",
type="llm",
status="available",
model="mlx-community/qwen2",
alias="Qwen2",
inference_config=InferenceConfig(max_tokens=256, temperature=0.2),
)
runtime = LocalAI(
chat_config=chat_model,
remote_api_key="sk-example",
remote_base_url="https://openrouter.ai/api/v1",
)
hub = AiHub(local_ai=runtime)
client = hub.get_client(chat_model)
response = client.chat.completions.create(
model=chat_model.model,
messages=[
{"role": "user", "content": "Summarize this architecture."},
],
)
print(response.choices[0].message["content"])
Example: transcription
from track.inference import LocalAI, TranscriptionModelConfig
runtime = LocalAI(
backend="cuda",
transcription_config=TranscriptionModelConfig(
model_id="openai/whisper-small",
alias="Whisper Small",
),
)
result = runtime.transcribe("sample.wav")
print(result.text)
Example: OpenAI-style transcription
client = runtime.get_client()
result = client.audio.transcriptions.create(
model="openai/whisper-small",
file="sample.wav",
)
print(result.text)
Installation
The core package is intentionally small and works without the optional local backends.
Core install
uv sync
If you want to install from PyPI with pip, use:
pip install ai-track
For the latest main-branch publish, use:
pip install --pre ai-track
macOS MLX extras
uv sync --extra macos
For pip:
pip install "ai-track[macos]"
CUDA extras
uv sync --extra cuda
For pip:
pip install "ai-track[cuda]"
The CUDA extra brings in the GPU-oriented runtime stack, including vLLM, Transformers, Diffusers, and PyTorch-based helpers.
Testing
Run the full unit suite with:
uv run pytest -q tests
The tests focus on:
- hub routing decisions
- backend selection
- OpenAI-style client compatibility
- multimodal cleanup behavior
- transcription support
- CUDA factory selection
Development notes
- Prefer
track.hubfor routing decisions. - Keep optional imports lazy so the core package stays importable without MLX or CUDA dependencies.
- Add docstrings and type hints to new helpers and edited functions.
- Reuse shared helpers where both MLX and CUDA backends need the same logic.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_track-0.1.5.tar.gz.
File metadata
- Download URL: ai_track-0.1.5.tar.gz
- Upload date:
- Size: 407.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bff750bd552c24d5da1189e82a78b13ac26d688fb8afbebcf8db3c54df192f7b
|
|
| MD5 |
a6526fdd9c14998b83b26959b0b4a8a1
|
|
| BLAKE2b-256 |
f2d1017cebedc88972cc7b40917f857ce291ccbc4ab035b33d0dd63babb336ae
|
Provenance
The following attestation bundles were made for ai_track-0.1.5.tar.gz:
Publisher:
publish.yml on langelabs/ai-track
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_track-0.1.5.tar.gz -
Subject digest:
bff750bd552c24d5da1189e82a78b13ac26d688fb8afbebcf8db3c54df192f7b - Sigstore transparency entry: 1454363846
- Sigstore integration time:
-
Permalink:
langelabs/ai-track@e4a6df1589b82dfa53c28a18beb87990b4117ed4 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/langelabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e4a6df1589b82dfa53c28a18beb87990b4117ed4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ai_track-0.1.5-py3-none-any.whl.
File metadata
- Download URL: ai_track-0.1.5-py3-none-any.whl
- Upload date:
- Size: 54.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a68f318ed48b4c00dd919ab96d1f055b613ab2be8091285442dc65f11780f4f
|
|
| MD5 |
feaeb77ce8ba9a7efe531ace314b2351
|
|
| BLAKE2b-256 |
589a80e794f8c2876d8bf4561c71747458a9c232a4368f9e4925d5a537a16796
|
Provenance
The following attestation bundles were made for ai_track-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on langelabs/ai-track
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_track-0.1.5-py3-none-any.whl -
Subject digest:
9a68f318ed48b4c00dd919ab96d1f055b613ab2be8091285442dc65f11780f4f - Sigstore transparency entry: 1454363912
- Sigstore integration time:
-
Permalink:
langelabs/ai-track@e4a6df1589b82dfa53c28a18beb87990b4117ed4 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/langelabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e4a6df1589b82dfa53c28a18beb87990b4117ed4 -
Trigger Event:
push
-
Statement type: