OpenLLM: Self-hosting LLMs Made Easy.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aar0npham parano ssheng

These details have not been verified by PyPI

Project links

Project description

🦾 OpenLLM: Self-Hosting LLMs Made Easy

OpenLLM allows developers to run any open-source LLMs (Llama 3.3, Qwen2.5, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and BentoCloud.

Understand the design philosophy of OpenLLM.

Get Started

Run the following commands to install OpenLLM and explore it interactively.

pip install openllm  # or pip3 install openllm
openllm hello

hello

Supported models

OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a model repository to run custom models with OpenLLM.

Model	Parameters	Required GPU	Start a Server
deepseek	r1-671b	80Gx16	`openllm serve deepseek:r1-671b`
gemma2	2b	12G	`openllm serve gemma2:2b`
gemma3	3b	12G	`openllm serve gemma3:3b`
jamba1.5	mini-ff0a	80Gx2	`openllm serve jamba1.5:mini-ff0a`
llama3.1	8b	24G	`openllm serve llama3.1:8b`
llama3.2	1b	24G	`openllm serve llama3.2:1b`
llama3.3	70b	80Gx2	`openllm serve llama3.3:70b`
llama4	17b16e	80Gx8	`openllm serve llama4:17b16e`
mistral	8b-2410	24G	`openllm serve mistral:8b-2410`
mistral-large	123b-2407	80Gx4	`openllm serve mistral-large:123b-2407`
phi4	14b	80G	`openllm serve phi4:14b`
pixtral	12b-2409	80G	`openllm serve pixtral:12b-2409`
qwen2.5	7b	24G	`openllm serve qwen2.5:7b`
qwen2.5-coder	3b	24G	`openllm serve qwen2.5-coder:3b`
qwq	32b	80G	`openllm serve qwq:32b`

For the full model list, see the OpenLLM models repository.

Start an LLM server

To start an LLM server locally, use the openllm serve command and specify the model version.

[!NOTE] OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.
Create your Hugging Face token here.

Request access to the gated model, such as meta-llama/Llama-3.2-1B-Instruct.
Set your token as an environment variable by running:
export HF_TOKEN=<your token>

openllm serve llama3.2:1b

The server will be accessible at http://localhost:3000, providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:

The API host address: By default, the LLM is hosted at http://localhost:3000.
The model name: The name can be different depending on the tool you use.
The API key: The API key used for client authentication. This is optional.

Here are some examples:

OpenAI Python client

from openai import OpenAI

client = OpenAI(base_url='http://localhost:3000/v1', api_key='na')

# Use the following func to get the available models
# model_list = client.models.list()
# print(model_list)

chat_completion = client.chat.completions.create(
    model="meta-llama/Llama-3.2-1B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Explain superconductors like I'm five years old"
        }
    ],
    stream=True,
)
for chunk in chat_completion:
    print(chunk.choices[0].delta.content or "", end="")

LlamaIndex

from llama_index.llms.openai import OpenAI

llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Llama-3.2-1B-Instruct", api_key="dummy")
...

Chat UI

OpenLLM provides a chat UI at the /chat endpoint for the launched LLM server at http://localhost:3000/chat.

Chat with a model in the CLI

To start a chat conversation in the CLI, use the openllm run command and specify the model version.

openllm run llama3:8b

Model repository

A model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at this GitHub repository. To see all available models from the default and any added repository, use:

openllm model list

To ensure your local list of models is synchronized with the latest updates from all connected repositories, run:

openllm repo update

To review a model’s information, run:

openllm model get llama3.2:1b

Add a model to the default model repository

You can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this example pull request.

Set up a custom repository

You can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a bentos directory to store custom LLMs. You need to build your Bentos with BentoML and submit them to your model repository.

First, prepare your custom models in a bentos directory following the guidelines provided by BentoML to build Bentos. Check out the default model repository for an example and read the Developer Guide for details.

Then, register your custom model repository with OpenLLM:

openllm repo add <repo-name> <repo-url>

Note: Currently, OpenLLM only supports adding public repositories.

Deploy to BentoCloud

OpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud.

openllm deploy llama3.2:1b --env HF_TOKEN

[!NOTE] If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables.

Once the deployment is complete, you can run model inference on the BentoCloud console:

Community

OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 Join our Slack community!

Contributing

As an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute:

Repost a bug by creating a GitHub issue.
Submit a pull request or help review other developers’ pull requests.
Add an LLM to the OpenLLM default model repository so that other users can run your model. See the pull request template.
Check out the Developer Guide to learn more.

Acknowledgements

This project uses the following open-source projects:

bentoml/bentoml for production level model serving
vllm-project/vllm for production level LLM backend
blrchen/chatgpt-lite for a fancy Web Chat UI
astral-sh/uv for blazing fast model requirements installing

We are grateful to the developers and contributors of these projects for their hard work and dedication.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aar0npham parano ssheng

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.6.30

Apr 21, 2025

0.6.29

Apr 16, 2025

0.6.28

Apr 16, 2025

0.6.27

Apr 16, 2025

0.6.26

Apr 16, 2025

0.6.25

Apr 11, 2025

0.6.24

Apr 10, 2025

0.6.23

Apr 2, 2025

0.6.22

Apr 1, 2025

0.6.21

Apr 1, 2025

0.6.20

Mar 12, 2025

0.6.19

Feb 15, 2025

0.6.18

Feb 7, 2025

0.6.17

Jan 13, 2025

0.6.16

Dec 19, 2024

0.6.15

Dec 3, 2024

0.6.14

Oct 29, 2024

0.6.13

Oct 17, 2024

0.6.12

Oct 17, 2024

0.6.11

Sep 30, 2024

0.6.10

Aug 19, 2024

0.6.9

Aug 12, 2024

0.6.8

Aug 12, 2024

0.6.7

Aug 2, 2024

0.6.6

Aug 1, 2024

0.6.5

Jul 15, 2024

0.6.4

Jul 12, 2024

0.6.3

Jul 11, 2024

0.6.2

Jul 11, 2024

0.6.1

Jul 10, 2024

0.6.0

Jul 10, 2024

0.5.7

Jun 14, 2024

0.5.6

Jun 11, 2024

0.5.5

Jun 3, 2024

0.5.4

Jun 1, 2024

0.5.3

May 30, 2024

0.5.2

May 29, 2024

0.5.1

May 29, 2024

0.5.0 yanked

May 27, 2024

Reason this release was yanked:

bug with prompt_token_ids

0.5.0a15 pre-release

May 27, 2024

0.5.0a14 pre-release

May 23, 2024

0.5.0a13 pre-release

May 22, 2024

0.5.0a12 pre-release

May 14, 2024

0.5.0a11 pre-release

May 12, 2024

0.5.0a10 pre-release

May 9, 2024

0.5.0a9 pre-release

May 9, 2024

0.5.0a8 pre-release

May 9, 2024

0.5.0a7 pre-release

May 9, 2024

0.5.0a6 pre-release

May 9, 2024

0.5.0a5 pre-release

May 8, 2024

0.5.0a4 pre-release

May 8, 2024

0.5.0a3 pre-release

Apr 2, 2024

0.5.0a2 pre-release

Apr 2, 2024

0.5.0a1 pre-release

Mar 21, 2024

0.5.0a0 pre-release

Mar 15, 2024

0.4.44

Feb 6, 2024

0.4.43

Feb 5, 2024

0.4.42

Feb 2, 2024

0.4.41

Dec 18, 2023

0.4.40

Dec 15, 2023

0.4.39

Dec 14, 2023

0.4.38

Dec 13, 2023

0.4.37

Dec 13, 2023

0.4.36

Dec 12, 2023

0.4.35

Dec 7, 2023

0.4.34

Nov 30, 2023

0.4.33

Nov 29, 2023

0.4.32

Nov 29, 2023

0.4.31

Nov 26, 2023

0.4.30

Nov 26, 2023

0.4.29

Nov 26, 2023

0.4.28

Nov 24, 2023

0.4.27

Nov 24, 2023

0.4.26

Nov 22, 2023

0.4.25

Nov 22, 2023

0.4.24

Nov 22, 2023

0.4.23

Nov 22, 2023

0.4.22

Nov 21, 2023

0.4.21

Nov 20, 2023

0.4.20

Nov 20, 2023

0.4.19

Nov 20, 2023

0.4.18

Nov 20, 2023

0.4.17

Nov 20, 2023

0.4.16

Nov 19, 2023

0.4.15

Nov 19, 2023

0.4.14

Nov 17, 2023

0.4.13

Nov 17, 2023

0.4.12

Nov 17, 2023

0.4.11

Nov 17, 2023

0.4.10

Nov 17, 2023

0.4.9

Nov 15, 2023

0.4.8

Nov 15, 2023

0.4.7

Nov 15, 2023

0.4.6

Nov 14, 2023

0.4.5

Nov 13, 2023

0.4.4

Nov 12, 2023

0.4.3

Nov 12, 2023

0.4.2

Nov 12, 2023

0.4.1

Nov 8, 2023

0.4.0

Nov 7, 2023

0.3.14

Nov 4, 2023

0.3.13

Oct 31, 2023

0.3.12

Oct 30, 2023

0.3.10

Oct 30, 2023

0.3.9

Oct 17, 2023

0.3.8

Oct 16, 2023

0.3.7

Oct 12, 2023

0.3.6

Sep 19, 2023

0.3.5

Sep 18, 2023

0.3.4

Sep 14, 2023

0.3.3

Sep 7, 2023

0.3.2

Sep 6, 2023

0.3.1

Sep 6, 2023

0.3.0

Sep 4, 2023

0.2.27

Aug 25, 2023

0.2.26

Aug 17, 2023

0.2.25

Aug 16, 2023

0.2.24

Aug 15, 2023

0.2.23

Aug 15, 2023

0.2.22

Aug 11, 2023

0.2.21 yanked

Aug 11, 2023

Reason this release was yanked:

broken client

0.2.20

Aug 10, 2023

0.2.19 yanked

Aug 10, 2023

Reason this release was yanked:

broken imports from compiled init

0.2.18

Aug 9, 2023

0.2.17

Aug 8, 2023

0.2.16

Aug 4, 2023

0.2.15 yanked

Aug 4, 2023

Reason this release was yanked:

include a regression with vllm

0.2.14 yanked

Aug 4, 2023

Reason this release was yanked:

include a regression with vllm

0.2.13

Aug 3, 2023

0.2.12

Aug 1, 2023

0.2.11

Jul 28, 2023

0.2.10

Jul 25, 2023

0.2.9

Jul 24, 2023

0.2.8

Jul 24, 2023

0.2.7

Jul 23, 2023

0.2.6

Jul 22, 2023

0.2.5

Jul 21, 2023

0.2.4

Jul 21, 2023

0.2.3

Jul 21, 2023

0.2.2

Jul 21, 2023

0.2.1 yanked

Jul 20, 2023

Reason this release was yanked:

Broken installation with openllm[llama]

0.2.0

Jul 20, 2023

0.1.20

Jul 5, 2023

0.1.19

Jun 29, 2023

0.1.18

Jun 29, 2023

0.1.17

Jun 27, 2023

0.1.16

Jun 27, 2023

0.1.15

Jun 26, 2023

0.1.14

Jun 25, 2023

0.1.13

Jun 24, 2023

0.1.12

Jun 24, 2023

0.1.11

Jun 23, 2023

0.1.10

Jun 21, 2023

0.1.9

Jun 21, 2023

0.1.8

Jun 19, 2023

0.1.7

Jun 19, 2023

0.1.6

Jun 17, 2023

0.1.5

Jun 15, 2023

0.1.4

Jun 14, 2023

0.1.3

Jun 14, 2023

0.1.2

Jun 13, 2023

0.1.1

Jun 12, 2023

0.1.0

Jun 12, 2023

0.0.34

Jun 11, 2023

0.0.33

Jun 10, 2023

0.0.32

Jun 9, 2023

0.0.31

Jun 8, 2023

0.0.30

Jun 8, 2023

0.0.29

Jun 8, 2023

0.0.28

Jun 8, 2023

0.0.27

Jun 8, 2023

0.0.26

Jun 7, 2023

0.0.25

Jun 6, 2023

0.0.24

Jun 6, 2023

0.0.23

Jun 6, 2023

0.0.22

Jun 6, 2023

0.0.21

Jun 4, 2023

0.0.19

Jun 4, 2023

0.0.18

Jun 4, 2023

0.0.17

Jun 4, 2023

0.0.16

Jun 2, 2023

0.0.15

Jun 1, 2023

0.0.14

May 31, 2023

0.0.13

May 30, 2023

0.0.12

May 30, 2023

0.0.11

May 30, 2023

0.0.10

May 29, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openllm-0.6.30.tar.gz (175.4 kB view details)

Uploaded Apr 21, 2025 Source

Built Distribution

openllm-0.6.30-py3-none-any.whl (31.8 kB view details)

Uploaded Apr 21, 2025 Python 3

File details

Details for the file openllm-0.6.30.tar.gz.

File metadata

Download URL: openllm-0.6.30.tar.gz
Upload date: Apr 21, 2025
Size: 175.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for openllm-0.6.30.tar.gz
Algorithm	Hash digest
SHA256	`50f521ba49e50ea5c8d4092abcb651e52305128f0e3f3c012cf0dbb97429f1da`
MD5	`30ef1667d6efe4496e1215a0eff018fd`
BLAKE2b-256	`4684ffab34e1fb4045001a007d1bfcd3917f73b6f2c769f3780525829ff54fb2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openllm-0.6.30.tar.gz:

Publisher: create-releases.yml on bentoml/OpenLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openllm-0.6.30.tar.gz
- Subject digest: 50f521ba49e50ea5c8d4092abcb651e52305128f0e3f3c012cf0dbb97429f1da
- Sigstore transparency entry: 199991827
- Sigstore integration time: Apr 21, 2025
Source repository:
- Permalink: bentoml/OpenLLM@f96ea77c4536efce6d1d39ebbe2da53ad75ae9e5
- Branch / Tag: refs/tags/v0.6.30
- Owner: https://github.com/bentoml
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: create-releases.yml@f96ea77c4536efce6d1d39ebbe2da53ad75ae9e5
- Trigger Event: push

File details

Details for the file openllm-0.6.30-py3-none-any.whl.

File metadata

Download URL: openllm-0.6.30-py3-none-any.whl
Upload date: Apr 21, 2025
Size: 31.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for openllm-0.6.30-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5efbbcb19e23c4e3016737216f10ae0e15320b7e8db6967b2ff5f0dac87e8e38`
MD5	`d187170c10059ca4aa1484045289883e`
BLAKE2b-256	`760f433247fc698b7a1fd28d31f7c31882d4210c064f3fa3ad82fb0ee6854ff4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openllm-0.6.30-py3-none-any.whl:

Publisher: create-releases.yml on bentoml/OpenLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openllm-0.6.30-py3-none-any.whl
- Subject digest: 5efbbcb19e23c4e3016737216f10ae0e15320b7e8db6967b2ff5f0dac87e8e38
- Sigstore transparency entry: 199991830
- Sigstore integration time: Apr 21, 2025
Source repository:
- Permalink: bentoml/OpenLLM@f96ea77c4536efce6d1d39ebbe2da53ad75ae9e5
- Branch / Tag: refs/tags/v0.6.30
- Owner: https://github.com/bentoml
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: create-releases.yml@f96ea77c4536efce6d1d39ebbe2da53ad75ae9e5
- Trigger Event: push

openllm 0.6.30

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦾 OpenLLM: Self-Hosting LLMs Made Easy

Get Started

Supported models

Start an LLM server

Chat UI

Chat with a model in the CLI

Model repository

Add a model to the default model repository

Set up a custom repository

Deploy to BentoCloud

Community

Contributing

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance