Open-source, OpenAI-compatible API server with pluggable providers for any model and any infrastructure

These details have been verified by PyPI

Project links

Homepage

Owner

Llama Stack

GitHub Statistics

These details have not been verified by PyPI

Project description

Llama Stack

Quick Start | Documentation | OpenAI API Compatibility | Discord

Open-source agentic API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.

Llama Stack Architecture

Llama Stack is a drop-in replacement for the OpenAI API that you can run anywhere — your laptop, your datacenter, or the cloud. Use any OpenAI-compatible client or agentic framework. Swap between Llama, GPT, Gemini, Mistral, or any model without changing your application code.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)

What you get

Chat Completions & Embeddings — standard /v1/chat/completions, /v1/completions, and /v1/embeddings endpoints, compatible with any OpenAI client
Responses API — server-side agentic orchestration with tool calling, MCP server integration, and built-in file search (RAG) in a single API call (learn more)
Vector Stores & Files — /v1/vector_stores and /v1/files for managed document storage and search
Batches — /v1/batches for offline batch processing
Open Responses conformant — the Responses API implementation passes the Open Responses conformance test suite

Use any model, use any infrastructure

Llama Stack has a pluggable provider architecture. Develop locally with Ollama, deploy to production with vLLM, or connect to a managed service — the API stays the same.

See the provider documentation for the full list.

Get started

Install and run a Llama Stack server:

# One-line install
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash

# Or install via uv
uv pip install llama-stack

# Start the server (uses the starter distribution with Ollama)
llama stack run

Then connect with any OpenAI client — Python, TypeScript, curl, or any framework that speaks the OpenAI API.

See the Quick Start guide for detailed setup.

Resources

Documentation — full reference
OpenAI API Compatibility — endpoint coverage and provider matrix
Getting Started Notebook — text and vision inference walkthrough
Contributing — how to contribute

Client SDKs:

Language	SDK	Package
Python	llama-stack-client-python
TypeScript	llama-stack-client-typescript

Community

We hold regular community calls every Thursday at 09:00 AM PST — see the Community Event on Discord for details.

Thanks to all our amazing contributors!

Project details

These details have been verified by PyPI

Project links

Homepage

Owner

Llama Stack

GitHub Statistics

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.7.1

Apr 8, 2026

0.7.0

Apr 1, 2026

0.6.1

Mar 30, 2026

0.6.0

Mar 11, 2026

0.5.2

Mar 6, 2026

0.5.1

Feb 19, 2026

0.5.0

Feb 5, 2026

0.5.0rc1 pre-release

Feb 4, 2026

0.4.5

Feb 19, 2026

0.4.4

Jan 30, 2026

0.4.3

Jan 26, 2026

0.4.2

Jan 16, 2026

0.4.1

Jan 13, 2026

0.4.0

Jan 6, 2026

0.3.5

Dec 15, 2025

0.3.4

Dec 3, 2025

0.3.3

Nov 24, 2025

0.3.2

Nov 12, 2025

0.3.1

Oct 31, 2025

0.3.0

Oct 21, 2025

0.2.24

Nov 12, 2025

0.2.23

Sep 26, 2025

0.2.22

Sep 16, 2025

0.2.21

Sep 8, 2025

0.2.20

Aug 29, 2025

0.2.19

Aug 26, 2025

0.2.18

Aug 19, 2025

0.2.17

Aug 5, 2025

0.2.16

Jul 28, 2025

0.2.15

Jul 15, 2025

0.2.14

Jul 4, 2025

0.2.13

Jun 27, 2025

0.2.12

Jun 20, 2025

0.2.11

Jun 17, 2025

0.2.10.1

Jun 6, 2025

0.2.10

Jun 5, 2025

0.2.9

May 30, 2025

0.2.8

May 27, 2025

0.2.7

May 16, 2025

0.2.6

May 12, 2025

0.2.5

May 3, 2025

0.2.4

Apr 29, 2025

0.2.3

Apr 25, 2025

0.2.2

Apr 13, 2025

0.2.1

Apr 5, 2025

0.2.0

Apr 5, 2025

0.1.9

Mar 29, 2025

0.1.8

Mar 23, 2025

0.1.7

Mar 14, 2025

0.1.6

Mar 8, 2025

0.1.5.1

Feb 28, 2025

0.1.5

Feb 28, 2025

0.1.4

Feb 24, 2025

0.1.3

Feb 14, 2025

0.1.2

Feb 7, 2025

0.1.1

Feb 2, 2025

0.1.0

Jan 24, 2025

0.0.63

Dec 18, 2024

0.0.62

Dec 18, 2024

0.0.61

Dec 10, 2024

0.0.60

Dec 10, 2024

0.0.59

Dec 10, 2024

0.0.58

Dec 6, 2024

0.0.57

Dec 3, 2024

0.0.56

Nov 30, 2024

0.0.55

Nov 23, 2024

0.0.54

Nov 22, 2024

0.0.53

Nov 20, 2024

0.0.52

Nov 9, 2024

0.0.51

Nov 9, 2024

0.0.51.dev0 pre-release

Nov 9, 2024

0.0.50

Nov 9, 2024

0.0.49

Nov 5, 2024

0.0.48

Nov 5, 2024

0.0.47

Oct 28, 2024

0.0.46

Oct 25, 2024

0.0.45

Oct 24, 2024

0.0.44

Oct 24, 2024

0.0.43

Oct 22, 2024

0.0.42

Oct 14, 2024

0.0.41

Oct 10, 2024

0.0.40

Oct 4, 2024

0.0.39

Oct 3, 2024

0.0.38

Oct 3, 2024

0.0.37

Oct 2, 2024

0.0.36

Sep 25, 2024

0.0.35

Sep 25, 2024

0.0.24

Sep 24, 2024

0.0.23

Sep 24, 2024

0.0.21

Sep 23, 2024

0.0.20

Sep 19, 2024

0.0.19

Sep 18, 2024

0.0.18

Sep 18, 2024

0.0.1a5 pre-release

Sep 13, 2024

0.0.1a4 pre-release

Sep 12, 2024

0.0.1a3 pre-release

Sep 12, 2024

0.0.1a2 pre-release

Sep 11, 2024

0.0.1a1 pre-release

Sep 10, 2024

0.0.1a0 pre-release

Sep 10, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_stack-0.7.1.tar.gz (15.9 MB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_stack-0.7.1-py3-none-any.whl (782.7 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file llama_stack-0.7.1.tar.gz.

File metadata

Download URL: llama_stack-0.7.1.tar.gz
Upload date: Apr 8, 2026
Size: 15.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llama_stack-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`d88dc8430abe1d26f3908ab506f0f6ccd4f3c16ba06ee46ac662f4c896c52da8`
MD5	`a6d45e1193de4adeda8500c88b8405d9`
BLAKE2b-256	`849ba8577f0761b02be4d45d625cf5870ab64859dbbe219088d6c8a39e9cb79d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack-0.7.1.tar.gz:

Publisher: pypi.yml on llamastack/llama-stack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llama_stack-0.7.1.tar.gz
- Subject digest: d88dc8430abe1d26f3908ab506f0f6ccd4f3c16ba06ee46ac662f4c896c52da8
- Sigstore transparency entry: 1254762797
- Sigstore integration time: Apr 8, 2026
Source repository:
- Permalink: llamastack/llama-stack@32aa0848078217cff99b4d2d71d907f4f3a03455
- Branch / Tag: refs/tags/v0.7.1
- Owner: https://github.com/llamastack
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@32aa0848078217cff99b4d2d71d907f4f3a03455
- Trigger Event: release

File details

Details for the file llama_stack-0.7.1-py3-none-any.whl.

File metadata

Download URL: llama_stack-0.7.1-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 782.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llama_stack-0.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`65b9c827989011af3879ad051db9134163f8e6b3bc3685340328f21f5d07cf5f`
MD5	`94999741d0921de745061b8e89db9cad`
BLAKE2b-256	`be41f57aedebba533bf5e8b7c5f3f2ee1edb88409e1aa5c83fa7ddc5f85b3c48`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack-0.7.1-py3-none-any.whl:

Publisher: pypi.yml on llamastack/llama-stack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llama_stack-0.7.1-py3-none-any.whl
- Subject digest: 65b9c827989011af3879ad051db9134163f8e6b3bc3685340328f21f5d07cf5f
- Sigstore transparency entry: 1254762872
- Sigstore integration time: Apr 8, 2026
Source repository:
- Permalink: llamastack/llama-stack@32aa0848078217cff99b4d2d71d907f4f3a03455
- Branch / Tag: refs/tags/v0.7.1
- Owner: https://github.com/llamastack
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@32aa0848078217cff99b4d2d71d907f4f3a03455
- Trigger Event: release

llama-stack 0.7.1

Navigation

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Project description

Llama Stack

What you get

Use any model, use any infrastructure

Get started

Resources

Community

Project details

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance