Foundry Local Manager Python SDK: Control-plane SDK for Foundry Local.

These details have not been verified by PyPI

Project links

Homepage

Project description

Foundry Local Python SDK

The Foundry Local Python SDK provides a Python interface for interacting with local AI models via the Foundry Local Core native library. It allows you to discover, download, load, and run inference on models directly on your local machine — no cloud required.

Features

Model Discovery – browse and search the model catalog
Model Management – download, cache, load, and unload models
Chat Completions – OpenAI-compatible chat API (non-streaming and streaming)
Tool Calling – function-calling support with chat completions
Embeddings – generate text embeddings via OpenAI-compatible API
Audio Transcription – Whisper-based speech-to-text (non-streaming and streaming)
Built-in Web Service – optional HTTP endpoint for multi-process scenarios
Native Performance – ctypes FFI to AOT-compiled Foundry Local Core

Installation

Two package variants are published — choose the one that matches your target hardware:

Variant	Package	Native backends
Standard (cross-platform)	`foundry-local-sdk`	CPU / WebGPU / CUDA
WinML (Windows only)	`foundry-local-sdk-winml`	Windows ML + all standard backends

# Standard (cross-platform — Linux, macOS, Windows)
pip install foundry-local-sdk

# WinML (Windows only)
pip install foundry-local-sdk-winml

Each package installs the correct native binaries (foundry-local-core, onnxruntime-core, onnxruntime-genai-core) as wheel dependencies. They are mutually exclusive — install only one per environment. WinML is auto-detected at runtime: if the WinML package is installed, the SDK automatically enables the Windows App Runtime Bootstrap.

Building from source

cd sdk/python

# Standard wheel
python -m build --wheel

# WinML wheel (uses the build_backend.py shim)
python -m build --wheel -C winml=true

For editable installs during development (native packages installed separately via foundry-local-install):

pip install -e .

Installing native binaries for development / CI

When working from source the native packages are not pulled in automatically. Use the foundry-local-install CLI to install them:

# Standard
foundry-local-install

# WinML (Windows only)
foundry-local-install --winml

Add --verbose to print the resolved binary paths after installation:

foundry-local-install --verbose
foundry-local-install --winml --verbose

Note: The standard and WinML native packages use different PyPI package names (foundry-local-core vs foundry-local-core-winml) so they can coexist in the same pip index, but they should not be installed in the same Python environment simultaneously.

Explicit EP Management

You can explicitly discover and download execution providers (EPs):

# Discover available EPs and registration status
eps = manager.discover_eps()
for ep in eps:
    print(f"{ep.name} - registered: {ep.is_registered}")

# Download and register all available EPs
result = manager.download_and_register_eps()
print(f"Success: {result.success}, Status: {result.status}")

# Download only specific EPs
result2 = manager.download_and_register_eps([eps[0].name])

Per-EP download progress

Pass a progress_callback to receive (ep_name, percent) updates as each EP downloads (percent is 0–100):

current_ep = ""

def on_progress(ep_name: str, percent: float) -> None:
    global current_ep
    if ep_name != current_ep:
        if current_ep:
            print()
        current_ep = ep_name
    print(f"\r  {ep_name}  {percent:5.1f}%", end="", flush=True)

manager.download_and_register_eps(progress_callback=on_progress)
print()

Catalog access does not block on EP downloads. Call download_and_register_eps() when you need hardware-accelerated execution providers.

Quick Start

from foundry_local_sdk import Configuration, FoundryLocalManager

# 1. Initialize
config = Configuration(app_name="MyApp")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

# 2. Discover models
catalog = manager.catalog
models = catalog.list_models()
for m in models:
    print(f"  {m.alias}")

# 3. Load a model
model = catalog.get_model("phi-3.5-mini")
model.load()

# 4. Chat
client = model.get_chat_client()
response = client.complete_chat([
    {"role": "user", "content": "Why is the sky blue?"}
])
print(response.choices[0].message.content)

# 5. Cleanup
model.unload()

Usage

Initialization

Create a Configuration and initialize the singleton FoundryLocalManager.

from foundry_local_sdk import Configuration, FoundryLocalManager
from foundry_local_sdk.configuration import LogLevel

config = Configuration(
    app_name="MyApp",
    model_cache_dir="/path/to/cache",     # optional
    log_level=LogLevel.INFORMATION,        # optional (default: Warning)
    additional_settings={"Bootstrap": "false"},  # optional
)
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

Discovering Models

catalog = manager.catalog

# List all models in the catalog
models = catalog.list_models()

# Get a specific model by alias
model = catalog.get_model("qwen2.5-0.5b")

# Get a specific variant by ID
variant = catalog.get_model_variant("qwen2.5-0.5b-instruct-generic-cpu:4")

# List locally cached models
cached = catalog.get_cached_models()

# List currently loaded models
loaded = catalog.get_loaded_models()

Inspecting Model Metadata

IModel exposes metadata properties from the catalog:

model = catalog.get_model("phi-3.5-mini")

# Identity
print(model.id)             # e.g. "phi-3.5-mini-instruct-generic-gpu:3"
print(model.alias)          # e.g. "phi-3.5-mini"

# Context and token limits
print(model.context_length) # e.g. 131072 (tokens), or None if unknown

# Modalities and capabilities
print(model.input_modalities)   # e.g. "text" or "text,image"
print(model.output_modalities)  # e.g. "text"
print(model.capabilities)       # e.g. "chat,completion"
print(model.supports_tool_calling)  # True, False, or None

# Cache / load state
print(model.is_cached)
print(model.is_loaded)

Loading and Running a Model

model = catalog.get_model("qwen2.5-0.5b")

# Select a specific variant (optional – defaults to highest-priority cached variant)
cached = catalog.get_cached_models()
variant = next(v for v in cached if v.alias == "qwen2.5-0.5b")
model.select_variant(variant)

# Load into memory
model.load()

# Non-streaming chat
client = model.get_chat_client()
client.settings.temperature = 0.0
client.settings.max_tokens = 500

result = client.complete_chat([
    {"role": "user", "content": "What is 7 multiplied by 6?"}
])
print(result.choices[0].message.content)  # "42"

# Streaming chat
messages = [{"role": "user", "content": "Tell me a joke"}]

for chunk in client.complete_streaming_chat(messages):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

# Unload when done
model.unload()

Embeddings

Generate text embeddings using the EmbeddingClient:

embedding_client = model.get_embedding_client()

# Single input
response = embedding_client.generate_embedding(
    "The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding  # List[float]
print(f"Dimensions: {len(embedding)}")

# Batch input
batch_response = embedding_client.generate_embeddings([
    "The quick brown fox",
    "The capital of France is Paris"
])
# batch_response.data[0].embedding, batch_response.data[1].embedding

Web Service (Optional)

Start a built-in HTTP server for multi-process access.

manager.start_web_service()
print(f"Listening on: {manager.urls}")

# ... use the service ...

manager.stop_web_service()

API Reference

Core Classes

Class	Description
`Configuration`	SDK configuration (app name, cache dir, log level, web service settings)
`FoundryLocalManager`	Singleton entry point – initialization, catalog access, web service
`EpInfo`	Discoverable execution provider info (`name`, `is_registered`)
`EpDownloadResult`	Result of EP download/registration (`success`, `status`, `registered_eps`, `failed_eps`)
`Catalog`	Model discovery – listing, lookup by alias/ID, cached/loaded queries
`IModel`	Abstract interface for models — identity, metadata, lifecycle, client creation, variant selection

OpenAI Clients

Class	Description
`ChatClient`	Chat completions (non-streaming and streaming) with tool calling
`EmbeddingClient`	Text embedding generation via OpenAI-compatible API
`AudioClient`	Audio transcription (non-streaming and streaming)

Internal / Detail

Class	Description
`Model`	Alias-level `IModel` implementation used by `Catalog.get_model()` (implementation detail)
`ModelVariant`	Specific model variant (implementation detail — implements `IModel`)
`CoreInterop`	ctypes FFI layer to the native Foundry Local Core library
`ModelLoadManager`	Load/unload via core interop or external web service
`ModelInfo`	Pydantic model for catalog entries

CLI entry point

Function	CLI name	Description
`foundry_local_sdk.detail.utils.foundry_local_install`	`foundry-local-install`	Install and verify native binaries (`--winml` for WinML variant)

Migration note: The function was previously named verify_native_install. The public CLI name (foundry-local-install) and its behaviour are unchanged; only the Python function name in foundry_local_sdk.detail.utils was updated to foundry_local_install for consistency.

Running Tests

pip install -r requirements-dev.txt
python -m pytest test/ -v

See test/README.md for detailed test setup and structure.

Running Examples

python examples/chat_completion.py

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.0

May 5, 2026

1.0.0

Apr 9, 2026

0.5.1

Nov 21, 2025

0.5.0

Sep 23, 2025

0.4.0

Sep 12, 2025

0.3.1

May 19, 2025

0.3.0

May 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

foundry_local_sdk-1.1.0-py3-none-any.whl (46.0 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file foundry_local_sdk-1.1.0-py3-none-any.whl.

File metadata

Download URL: foundry_local_sdk-1.1.0-py3-none-any.whl
Upload date: May 5, 2026
Size: 46.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for foundry_local_sdk-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9792fb30851bcee929d530b999be01a3ed25847548f415991e387eb34335a35f`
MD5	`77b95b8b3201a6fb710b2c8c532c4414`
BLAKE2b-256	`6939c864bd9969086696914d59618b7a11efaffc985446646ba8b7ebf4d0b41a`

See more details on using hashes here.

foundry-local-sdk 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Foundry Local Python SDK

Features

Installation

Building from source

Installing native binaries for development / CI

Explicit EP Management

Per-EP download progress

Quick Start

Usage

Initialization

Discovering Models

Inspecting Model Metadata

Loading and Running a Model

Embeddings

Web Service (Optional)

API Reference

Core Classes

OpenAI Clients

Internal / Detail

CLI entry point

Running Tests

Running Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes