Skip to main content

OpenLLM: Self-hosting LLMs Made Easy.

Project description

🦾 OpenLLM: Self-Hosting LLMs Made Easy

License: Apache-2.0 Releases CI X Community

OpenLLM allows developers to run any open-source LLMs (Llama 3.2, Qwen2.5, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and BentoCloud.

Understand the design philosophy of OpenLLM.

Get Started

Run the following commands to install OpenLLM and explore it interactively.

pip install openllm  # or pip3 install openllm
openllm hello

hello

Supported models

OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a model repository to run custom models with OpenLLM.

Model Parameters Quantization Required GPU Start a Server
Llama 3.1 8B - 24G openllm serve llama3.1:8b
Llama 3.1 8B AWQ 4bit 12G openllm serve llama3.1:8b-4bit
Llama 3.1 70B AWQ 4bit 80G openllm serve llama3.1:70b-4bit
Llama 3.2 1B - 12G openllm serve llama3.2:1b
Llama 3.2 3B - 12G openllm serve llama3.2:3b
Llama 3.2 Vision 11B - 80G openllm serve llama3.2:11b-vision
Mistral 7B - 24G openllm serve mistral:7b
Qwen 2.5 1.5B - 12G openllm serve qwen2.5:1.5b
Gemma 2 9B - 24G openllm serve gemma2:9b
Phi3 3.8B - 12G openllm serve phi3:3.8b

...

For the full model list, see the OpenLLM models repository.

Start an LLM server

To start an LLM server locally, use the openllm serve command and specify the model version.

[!NOTE] OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.

  1. Create your Hugging Face token here.
  2. Request access to the gated model, such as meta-llama/Meta-Llama-3-8B.
  3. Set your token as an environment variable by running:
    export HF_TOKEN=<your token>
    
openllm serve llama3:8b

The server will be accessible at http://localhost:3000, providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:

  • The API host address: By default, the LLM is hosted at http://localhost:3000.
  • The model name: The name can be different depending on the tool you use.
  • The API key: The API key used for client authentication. This is optional.

Here are some examples:

OpenAI Python client
from openai import OpenAI

client = OpenAI(base_url='http://localhost:3000/v1', api_key='na')

# Use the following func to get the available models
# model_list = client.models.list()
# print(model_list)

chat_completion = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Explain superconductors like I'm five years old"
        }
    ],
    stream=True,
)
for chunk in chat_completion:
    print(chunk.choices[0].delta.content or "", end="")
LlamaIndex
from llama_index.llms.openai import OpenAI

llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Meta-Llama-3-8B-Instruct", api_key="dummy")
...

Chat UI

OpenLLM provides a chat UI at the /chat endpoint for the launched LLM server at http://localhost:3000/chat.

openllm_ui

Chat with a model in the CLI

To start a chat conversation in the CLI, use the openllm run command and specify the model version.

openllm run llama3:8b

Model repository

A model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at this GitHub repository. To see all available models from the default and any added repository, use:

openllm model list

To ensure your local list of models is synchronized with the latest updates from all connected repositories, run:

openllm repo update

To review a model’s information, run:

openllm model get llama3:8b

Add a model to the default model repository

You can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this example pull request.

Set up a custom repository

You can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a bentos directory to store custom LLMs. You need to build your Bentos with BentoML and submit them to your model repository.

First, prepare your custom models in a bentos directory following the guidelines provided by BentoML to build Bentos. Check out the default model repository for an example and read the Developer Guide for details.

Then, register your custom model repository with OpenLLM:

openllm repo add <repo-name> <repo-url>

Note: Currently, OpenLLM only supports adding public repositories.

Deploy to BentoCloud

OpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud.

Sign up for BentoCloud for free and log in. Then, run openllm deploy to deploy a model to BentoCloud:

openllm deploy llama3:8b

[!NOTE] If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables.

Once the deployment is complete, you can run model inference on the BentoCloud console:

bentocloud_ui

Community

OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 Join our Slack community!

Contributing

As an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute:

Acknowledgements

This project uses the following open-source projects:

We are grateful to the developers and contributors of these projects for their hard work and dedication.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openllm-0.6.14.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

openllm-0.6.14-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file openllm-0.6.14.tar.gz.

File metadata

  • Download URL: openllm-0.6.14.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for openllm-0.6.14.tar.gz
Algorithm Hash digest
SHA256 a05f475589954c3a4df89174526892e11f7fd6637fe5a553b130ec4511b14a3a
MD5 b0c4d71ad556d47467702183a2e6dd78
BLAKE2b-256 af90344962d2b3ecd17ab104c0f6ededa990ccf59b5f15b4a678264ef4eadbc7

See more details on using hashes here.

File details

Details for the file openllm-0.6.14-py3-none-any.whl.

File metadata

  • Download URL: openllm-0.6.14-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for openllm-0.6.14-py3-none-any.whl
Algorithm Hash digest
SHA256 9c42fbd117f67a5bf2573cddf30fa4d60ecbd802f9d14b2d69c870756f4f65c0
MD5 18bfe733a98a6e5c91ec531e680b5839
BLAKE2b-256 fd759937e8422984e7203a7a6bb9d9d90d668768f142855b0891702c207caf75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page