Open-source, OpenAI-compatible API server with pluggable providers for any model and any infrastructure
Project description
Llama Stack
Quick Start | Documentation | OpenAI API Compatibility | Discord
Open-source agentic API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.
Llama Stack is a drop-in replacement for the OpenAI API that you can run anywhere — your laptop, your datacenter, or the cloud. Use any OpenAI-compatible client or agentic framework. Swap between Llama, GPT, Gemini, Mistral, or any model without changing your application code.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
)
What you get
- Chat Completions & Embeddings — standard
/v1/chat/completions,/v1/completions, and/v1/embeddingsendpoints, compatible with any OpenAI client - Responses API — server-side agentic orchestration with tool calling, MCP server integration, and built-in file search (RAG) in a single API call (learn more)
- Vector Stores & Files —
/v1/vector_storesand/v1/filesfor managed document storage and search - Batches —
/v1/batchesfor offline batch processing - Open Responses conformant — the Responses API implementation passes the Open Responses conformance test suite
Use any model, use any infrastructure
Llama Stack has a pluggable provider architecture. Develop locally with Ollama, deploy to production with vLLM, or connect to a managed service — the API stays the same.
See the provider documentation for the full list.
Get started
Install and run a Llama Stack server:
# One-line install
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash
# Or install via uv
uv pip install llama-stack
# Start the server (uses the starter distribution with Ollama)
llama stack run
Then connect with any OpenAI client — Python, TypeScript, curl, or any framework that speaks the OpenAI API.
See the Quick Start guide for detailed setup.
Resources
- Documentation — full reference
- OpenAI API Compatibility — endpoint coverage and provider matrix
- Getting Started Notebook — text and vision inference walkthrough
- Contributing — how to contribute
Client SDKs:
| Language | SDK | Package |
|---|---|---|
| Python | llama-stack-client-python | |
| TypeScript | llama-stack-client-typescript |
Community
We hold regular community calls every Thursday at 09:00 AM PST — see the Community Event on Discord for details.
Thanks to all our amazing contributors!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_stack-0.7.0.tar.gz.
File metadata
- Download URL: llama_stack-0.7.0.tar.gz
- Upload date:
- Size: 15.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f61e477be46594a3ecaa5c52bdbe0ac0d5ad0c77a1671dcc227b7380e94e6d78
|
|
| MD5 |
836cef5cee2e801b8db96b1af5d23ce6
|
|
| BLAKE2b-256 |
13ce0682a276d444b5e701b1115d5cacfff0c71ea42b5762d97bd3f02fd3c8ba
|
Provenance
The following attestation bundles were made for llama_stack-0.7.0.tar.gz:
Publisher:
pypi.yml on llamastack/llama-stack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_stack-0.7.0.tar.gz -
Subject digest:
f61e477be46594a3ecaa5c52bdbe0ac0d5ad0c77a1671dcc227b7380e94e6d78 - Sigstore transparency entry: 1208540185
- Sigstore integration time:
-
Permalink:
llamastack/llama-stack@abf9236d19e263243eb06d7e08cd708d4460a50a -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/llamastack
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@abf9236d19e263243eb06d7e08cd708d4460a50a -
Trigger Event:
release
-
Statement type:
File details
Details for the file llama_stack-0.7.0-py3-none-any.whl.
File metadata
- Download URL: llama_stack-0.7.0-py3-none-any.whl
- Upload date:
- Size: 782.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1e4189490e5ca496e06f69a4a3e936d4b9b160dc58b34c239911e07c7d4ec24
|
|
| MD5 |
6dece22b9525aa77c13919a3c25764a4
|
|
| BLAKE2b-256 |
3ddf3baeb607d936fdb0fcc2c3104644f0661f716415ff9847ed5e16675d39fc
|
Provenance
The following attestation bundles were made for llama_stack-0.7.0-py3-none-any.whl:
Publisher:
pypi.yml on llamastack/llama-stack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_stack-0.7.0-py3-none-any.whl -
Subject digest:
c1e4189490e5ca496e06f69a4a3e936d4b9b160dc58b34c239911e07c7d4ec24 - Sigstore transparency entry: 1208540270
- Sigstore integration time:
-
Permalink:
llamastack/llama-stack@abf9236d19e263243eb06d7e08cd708d4460a50a -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/llamastack
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@abf9236d19e263243eb06d7e08cd708d4460a50a -
Trigger Event:
release
-
Statement type: