Skip to main content

Foundry Local Manager Python SDK: Control-plane SDK for Foundry Local.

Project description

Foundry Local Python SDK

The Foundry Local Python SDK provides a Python interface for interacting with local AI models via the Foundry Local Core native library. It allows you to discover, download, load, and run inference on models directly on your local machine — no cloud required.

Features

  • Model Discovery – browse and search the model catalog
  • Model Management – download, cache, load, and unload models
  • Chat Completions – OpenAI-compatible chat API (non-streaming and streaming)
  • Tool Calling – function-calling support with chat completions
  • Embeddings – generate text embeddings via OpenAI-compatible API
  • Audio Transcription – Whisper-based speech-to-text (non-streaming and streaming)
  • Built-in Web Service – optional HTTP endpoint for multi-process scenarios
  • Native Performance – ctypes FFI to AOT-compiled Foundry Local Core

Installation

Two package variants are published — choose the one that matches your target hardware:

Variant Package Native backends
Standard (cross-platform) foundry-local-sdk CPU / WebGPU / CUDA
WinML (Windows only) foundry-local-sdk-winml Windows ML + all standard backends
# Standard (cross-platform — Linux, macOS, Windows)
pip install foundry-local-sdk

# WinML (Windows only)
pip install foundry-local-sdk-winml

Each package installs the correct native binaries (foundry-local-core, onnxruntime-core, onnxruntime-genai-core) as wheel dependencies. They are mutually exclusive — install only one per environment. WinML is auto-detected at runtime: if the WinML package is installed, the SDK automatically enables the Windows App Runtime Bootstrap.

Building from source

cd sdk/python

# Standard wheel
python -m build --wheel

# WinML wheel (uses the build_backend.py shim)
python -m build --wheel -C winml=true

For editable installs during development (native packages installed separately via foundry-local-install):

pip install -e .

Installing native binaries for development / CI

When working from source the native packages are not pulled in automatically. Use the foundry-local-install CLI to install them:

# Standard
foundry-local-install

# WinML (Windows only)
foundry-local-install --winml

Add --verbose to print the resolved binary paths after installation:

foundry-local-install --verbose
foundry-local-install --winml --verbose

Note: The standard and WinML native packages use different PyPI package names (foundry-local-core vs foundry-local-core-winml) so they can coexist in the same pip index, but they should not be installed in the same Python environment simultaneously.

Explicit EP Management

You can explicitly discover and download execution providers (EPs):

# Discover available EPs and registration status
eps = manager.discover_eps()
for ep in eps:
    print(f"{ep.name} - registered: {ep.is_registered}")

# Download and register all available EPs
result = manager.download_and_register_eps()
print(f"Success: {result.success}, Status: {result.status}")

# Download only specific EPs
result2 = manager.download_and_register_eps([eps[0].name])

Per-EP download progress

Pass a progress_callback to receive (ep_name, percent) updates as each EP downloads (percent is 0–100):

current_ep = ""

def on_progress(ep_name: str, percent: float) -> None:
    global current_ep
    if ep_name != current_ep:
        if current_ep:
            print()
        current_ep = ep_name
    print(f"\r  {ep_name}  {percent:5.1f}%", end="", flush=True)

manager.download_and_register_eps(progress_callback=on_progress)
print()

Catalog access does not block on EP downloads. Call download_and_register_eps() when you need hardware-accelerated execution providers.

Quick Start

from foundry_local_sdk import Configuration, FoundryLocalManager

# 1. Initialize
config = Configuration(app_name="MyApp")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

# 2. Discover models
catalog = manager.catalog
models = catalog.list_models()
for m in models:
    print(f"  {m.alias}")

# 3. Load a model
model = catalog.get_model("phi-3.5-mini")
model.load()

# 4. Chat
client = model.get_chat_client()
response = client.complete_chat([
    {"role": "user", "content": "Why is the sky blue?"}
])
print(response.choices[0].message.content)

# 5. Cleanup
model.unload()

Usage

Initialization

Create a Configuration and initialize the singleton FoundryLocalManager.

from foundry_local_sdk import Configuration, FoundryLocalManager
from foundry_local_sdk.configuration import LogLevel

config = Configuration(
    app_name="MyApp",
    model_cache_dir="/path/to/cache",     # optional
    log_level=LogLevel.INFORMATION,        # optional (default: Warning)
    additional_settings={"Bootstrap": "false"},  # optional
)
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

Discovering Models

catalog = manager.catalog

# List all models in the catalog
models = catalog.list_models()

# Get a specific model by alias
model = catalog.get_model("qwen2.5-0.5b")

# Get a specific variant by ID
variant = catalog.get_model_variant("qwen2.5-0.5b-instruct-generic-cpu:4")

# List locally cached models
cached = catalog.get_cached_models()

# List currently loaded models
loaded = catalog.get_loaded_models()

Inspecting Model Metadata

IModel exposes metadata properties from the catalog:

model = catalog.get_model("phi-3.5-mini")

# Identity
print(model.id)             # e.g. "phi-3.5-mini-instruct-generic-gpu:3"
print(model.alias)          # e.g. "phi-3.5-mini"

# Context and token limits
print(model.context_length) # e.g. 131072 (tokens), or None if unknown

# Modalities and capabilities
print(model.input_modalities)   # e.g. "text" or "text,image"
print(model.output_modalities)  # e.g. "text"
print(model.capabilities)       # e.g. "chat,completion"
print(model.supports_tool_calling)  # True, False, or None

# Cache / load state
print(model.is_cached)
print(model.is_loaded)

Loading and Running a Model

model = catalog.get_model("qwen2.5-0.5b")

# Select a specific variant (optional – defaults to highest-priority cached variant)
cached = catalog.get_cached_models()
variant = next(v for v in cached if v.alias == "qwen2.5-0.5b")
model.select_variant(variant)

# Load into memory
model.load()

# Non-streaming chat
client = model.get_chat_client()
client.settings.temperature = 0.0
client.settings.max_tokens = 500

result = client.complete_chat([
    {"role": "user", "content": "What is 7 multiplied by 6?"}
])
print(result.choices[0].message.content)  # "42"

# Streaming chat
messages = [{"role": "user", "content": "Tell me a joke"}]

for chunk in client.complete_streaming_chat(messages):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

# Unload when done
model.unload()

Embeddings

Generate text embeddings using the EmbeddingClient:

embedding_client = model.get_embedding_client()

# Single input
response = embedding_client.generate_embedding(
    "The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding  # List[float]
print(f"Dimensions: {len(embedding)}")

# Batch input
batch_response = embedding_client.generate_embeddings([
    "The quick brown fox",
    "The capital of France is Paris"
])
# batch_response.data[0].embedding, batch_response.data[1].embedding

Web Service (Optional)

Start a built-in HTTP server for multi-process access.

manager.start_web_service()
print(f"Listening on: {manager.urls}")

# ... use the service ...

manager.stop_web_service()

API Reference

Core Classes

Class Description
Configuration SDK configuration (app name, cache dir, log level, web service settings)
FoundryLocalManager Singleton entry point – initialization, catalog access, web service
EpInfo Discoverable execution provider info (name, is_registered)
EpDownloadResult Result of EP download/registration (success, status, registered_eps, failed_eps)
Catalog Model discovery – listing, lookup by alias/ID, cached/loaded queries
IModel Abstract interface for models — identity, metadata, lifecycle, client creation, variant selection

OpenAI Clients

Class Description
ChatClient Chat completions (non-streaming and streaming) with tool calling
EmbeddingClient Text embedding generation via OpenAI-compatible API
AudioClient Audio transcription (non-streaming and streaming)

Internal / Detail

Class Description
Model Alias-level IModel implementation used by Catalog.get_model() (implementation detail)
ModelVariant Specific model variant (implementation detail — implements IModel)
CoreInterop ctypes FFI layer to the native Foundry Local Core library
ModelLoadManager Load/unload via core interop or external web service
ModelInfo Pydantic model for catalog entries

CLI entry point

Function CLI name Description
foundry_local_sdk.detail.utils.foundry_local_install foundry-local-install Install and verify native binaries (--winml for WinML variant)

Migration note: The function was previously named verify_native_install. The public CLI name (foundry-local-install) and its behaviour are unchanged; only the Python function name in foundry_local_sdk.detail.utils was updated to foundry_local_install for consistency.

Running Tests

pip install -r requirements-dev.txt
python -m pytest test/ -v

See test/README.md for detailed test setup and structure.

Running Examples

python examples/chat_completion.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

foundry_local_sdk-1.1.0-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file foundry_local_sdk-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for foundry_local_sdk-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9792fb30851bcee929d530b999be01a3ed25847548f415991e387eb34335a35f
MD5 77b95b8b3201a6fb710b2c8c532c4414
BLAKE2b-256 6939c864bd9969086696914d59618b7a11efaffc985446646ba8b7ebf4d0b41a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page