ai-track

Universal AI runtime for local and remote inference.

These details have been verified by PyPI

Project links

Source

GitHub Statistics

Maintainers

BobbyLange

These details have not been verified by PyPI

Project description

ai-track logo

ai-track is a universal AI runtime library for local and remote inference. It chooses the best available execution tier automatically, keeps the core package lightweight, and exposes an OpenAI-style client surface so application code can stay backend-agnostic.

What it does

Routes requests through local or remote inference automatically.
Supports macOS MLX backends for on-device inference.
Supports CUDA backends for GPU inference with vLLM and Hugging Face models.
Falls back to a remote OpenAI-compatible client when no local backend fits.
Exposes a familiar client surface for chat, embeddings, images, audio, and transcription.

Architecture

The codebase is split into two major layers:

track.inference contains the runtime primitives and backend implementations.
track.hub contains the public routing layer that decides whether a model should use local inference or a remote client.
track.contracts contains shared dataclasses, protocols, and base interfaces.
track.utils contains shared helper functions for devices, storage, audio, chat message handling, and transcription input prep.

The runtime is centered around LocalAI, which can manage:

chat generation
embeddings
image generation
text-to-speech
speech-to-text transcription

Runtime selection

The runtime chooses a backend automatically when you do not pass one explicitly:

macOS resolves to the MLX backend
CUDA-capable Linux systems resolve to the CUDA backend
everything else stays available through the remote OpenAI-compatible path

You can still force a backend explicitly when you need to.

Local-first routing

Routing is local-first:

The hub checks whether the selected model is local.
If the runtime can serve it locally, the request stays on-device.
Otherwise the hub falls back to a remote OpenAI-compatible client.

This keeps local inference fast and private when available while preserving a reliable remote fallback.

Public API

The main entrypoints are:

from track import hub, inference
from track.hub import AiHub
from track.inference import LocalAI

LocalAI exposes the local runtime directly and can also return an OpenAI-style client.

Hub resolves a final client for a selected model and is the preferred way to route requests from application code.

OpenAI-style client

The local compatibility layer mirrors the shape of the OpenAI Python client. It supports:

client.chat.completions.create(...)
client.embeddings.create(...)
client.images.generate(...)
client.audio.speech.create(...)
client.audio.transcriptions.create(...)

Example: chat

from track.hub import AiHub
from track.inference import AiModel, InferenceConfig, LocalAI

chat_model = AiModel(
  default=True,
  location="local",
  type="llm",
  status="available",
  model="mlx-community/qwen2",
  alias="Qwen2",
  inference_config=InferenceConfig(max_tokens=256, temperature=0.2),
)

runtime = LocalAI(
  chat_config=chat_model,
  remote_api_key="sk-example",
  remote_base_url="https://openrouter.ai/api/v1",
)

hub = AiHub(local_ai=runtime)
client = hub.get_client(chat_model)

response = client.chat.completions.create(
  model=chat_model.model,
  messages=[
    {"role": "user", "content": "Summarize this architecture."},
  ],
)

print(response.choices[0].message["content"])

Example: transcription

from track.inference import LocalAI, TranscriptionModelConfig

runtime = LocalAI(
    backend="cuda",
    transcription_config=TranscriptionModelConfig(
        model_id="openai/whisper-small",
        alias="Whisper Small",
    ),
)

result = runtime.transcribe("sample.wav")
print(result.text)

Example: OpenAI-style transcription

client = runtime.get_client()
result = client.audio.transcriptions.create(
    model="openai/whisper-small",
    file="sample.wav",
)
print(result.text)

Installation

The core package is intentionally small and works without the optional local backends.

Core install

uv sync

If you want to install from PyPI with pip, use:

pip install ai-track

For the latest main-branch publish, use:

pip install --pre ai-track

macOS MLX extras

uv sync --extra macos

For pip:

pip install "ai-track[macos]"

The macOS extra installs the full MLX runtime stack used by local inference, including the base mlx package alongside mlx-embeddings, mlx-lm, mlx-vlm, mlx-audio, and mflux.

MLX chat support is limited to model architectures that the installed mlx_vlm package can actually load. If mlx_vlm does not support a model's chat architecture, register that model only for the modalities you intend to use instead of advertising generic text/chat support.

For embeddings, MLX checkpoints that expose a native .embed() method are used directly. Embedding-focused MLX checkpoints that rely on the mlx-embeddings loader are also supported. Generic MLX checkpoints can be used for embeddings through hidden-state fallback pooling when the full MLX stack is installed.

For local embedding-only models, declare explicit capabilities so downstream apps do not boot the MLX chat backend unnecessarily:

from track.contracts import AiModel, AiModelCapabilities

embedding_model = AiModel(
    provider="local",
    model_id="your-org/your-embedding-model",
    alias="embedding-model",
    capabilities=AiModelCapabilities(
        embedding_input=True,
        embedding_output=True,
    ),
)

CUDA extras

uv sync --extra cuda

For pip:

pip install "ai-track[cuda]"

The CUDA extra brings in the GPU-oriented runtime stack, including vLLM, Transformers, Diffusers, and PyTorch-based helpers.

ai-track validates CUDA support against the pinned vllm 0.20.x minor line. Future vllm releases are not treated as automatically compatible just because they satisfy an open-ended lower bound.

For local embedding-only models, declare explicit capabilities so downstream apps do not register unrelated modalities and accidentally boot unused CUDA backends:

from track.contracts import AiModel, AiModelCapabilities

embedding_model = AiModel(
    provider="local",
    model_id="Qwen/Qwen3-Embedding-0.6B",
    alias="qwen-embedding",
    capabilities=AiModelCapabilities(
        embedding_input=True,
        embedding_output=True,
    ),
)

Testing

Run the full unit suite with:

uv run pytest -q tests

The tests focus on:

hub routing decisions
backend selection
OpenAI-style client compatibility
multimodal cleanup behavior
transcription support
CUDA factory selection

Development notes

Prefer track.hub for routing decisions.
Keep optional imports lazy so the core package stays importable without MLX or CUDA dependencies.
Add docstrings and type hints to new helpers and edited functions.
Reuse shared helpers where both MLX and CUDA backends need the same logic.

Project details

These details have been verified by PyPI

Project links

Source

GitHub Statistics

Maintainers

BobbyLange

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.10

May 8, 2026

0.1.9

May 7, 2026

0.1.8

May 7, 2026

0.1.7

May 6, 2026

0.1.6

May 6, 2026

0.1.5

May 6, 2026

0.1.4

May 5, 2026

0.1.3

May 5, 2026

0.1.1.dev1 pre-release

Apr 30, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_track-0.1.10.tar.gz (418.2 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_track-0.1.10-py3-none-any.whl (61.6 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file ai_track-0.1.10.tar.gz.

File metadata

Download URL: ai_track-0.1.10.tar.gz
Upload date: May 8, 2026
Size: 418.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_track-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`1c7b12fb9abc3889014e613a865f3acb7b4ff0842a6aa6c30b458fa67a0223f4`
MD5	`9e09964cb47b2e3738735b2fd13a7c06`
BLAKE2b-256	`ebb01bc7d736f8f0587496a498ae9084403625dd2a86591f83bfbbb6512d4b97`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_track-0.1.10.tar.gz:

Publisher: publish.yml on langelabs/ai-track

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_track-0.1.10.tar.gz
- Subject digest: 1c7b12fb9abc3889014e613a865f3acb7b4ff0842a6aa6c30b458fa67a0223f4
- Sigstore transparency entry: 1472869122
- Sigstore integration time: May 8, 2026
Source repository:
- Permalink: langelabs/ai-track@53c807a5efec00774eed9d580209fe6b15cd03e3
- Branch / Tag: refs/heads/main
- Owner: https://github.com/langelabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@53c807a5efec00774eed9d580209fe6b15cd03e3
- Trigger Event: push

File details

Details for the file ai_track-0.1.10-py3-none-any.whl.

File metadata

Download URL: ai_track-0.1.10-py3-none-any.whl
Upload date: May 8, 2026
Size: 61.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_track-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac6a704004c43f15f493915e458b9c97158fbf31f70d145430bedb43c83a4882`
MD5	`99a1e52d8e5fa0e867aaab4bf29838dd`
BLAKE2b-256	`9715a347c0d8431b1f1af870e614a8384884159796a57af2e7b161f3a9f2a402`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_track-0.1.10-py3-none-any.whl:

Publisher: publish.yml on langelabs/ai-track

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_track-0.1.10-py3-none-any.whl
- Subject digest: ac6a704004c43f15f493915e458b9c97158fbf31f70d145430bedb43c83a4882
- Sigstore transparency entry: 1472869213
- Sigstore integration time: May 8, 2026
Source repository:
- Permalink: langelabs/ai-track@53c807a5efec00774eed9d580209fe6b15cd03e3
- Branch / Tag: refs/heads/main
- Owner: https://github.com/langelabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@53c807a5efec00774eed9d580209fe6b15cd03e3
- Trigger Event: push

ai-track 0.1.10

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

What it does

Architecture

Runtime selection

Local-first routing

Public API

OpenAI-style client

Example: chat

Example: transcription

Example: OpenAI-style transcription

Installation

Core install

macOS MLX extras

CUDA extras

Testing

Development notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance