OpenLLM: REST/gRPC API server for running any open Large-Language Model - StableLM, Llama, Alpaca, Dolly, Flan-T5, Custom

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

OpenLLM

Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.
Powered by BentoML 🍱

📖 Introduction

With OpenLLM, you can easily run inference with any open-source large-language models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are some key features:

🚂 SOTA LLMs: With a single click, access support for state-of-the-art LLMs, including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.

🔥 Easy-to-use APIs: We provide intuitive interfaces by integrating with popular tools like BentoML, HuggingFace, LangChain, and more.

📦 Fine-tuning your own LLM: Customize any LLM to suit your needs with LLM.tuning(). (Work In Progress)

⛓️ Interoperability: First-class support for LangChain and BentoML’s runner architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In Progress)

🎯 Streamline Production Deployment: Seamlessly package into a Bento with openllm build, containerized into OCI Images, and deploy with a single click using ☁️ BentoCloud.

🏃‍ Getting Started

To use OpenLLM, you need to have Python 3.8 (or newer) and pip installed on your system. We highly recommend using a Virtual Environment to prevent package conflicts.

You can install OpenLLM using pip as follows:

pip install openllm

To verify if it's installed correctly, run:

openllm -h

The correct output will be:

Usage: openllm [OPTIONS] COMMAND [ARGS]...

   ██████╗ ██████╗ ███████╗███╗   ██╗██╗     ██╗     ███╗   ███╗
  ██╔═══██╗██╔══██╗██╔════╝████╗  ██║██║     ██║     ████╗ ████║
  ██║   ██║██████╔╝█████╗  ██╔██╗ ██║██║     ██║     ██╔████╔██║
  ██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║██║     ██║     ██║╚██╔╝██║
  ╚██████╔╝██║     ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
   ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝

  OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model

      - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more

      - Powered by BentoML 🍱

Starting an LLM Server

To start an LLM server, use openllm start. For example, to start a dolly-v2 server:

openllm start dolly-v2

Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you can experiment with the endpoints and sample prompts.

OpenLLM provides a built-in Python client, allowing you to interact with the model. In a different terminal window or a Jupyter notebook, create a client to start interacting with the model:

>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')

You can also use the openllm query command to query the model from the terminal:

openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"'

🚀 Deploying to Production

To deploy your LLMs into production:

Building a Bento: With OpenLLM, you can easily build a Bento for a specific model, like dolly-v2, using the build command.:
```
openllm build dolly-v2
```
A Bento, in BentoML, is the unit of distribution. It packages your program's source code, models, files, artifacts, and dependencies.

NOTE: If you wish to build OpenLLM from the git source, set OPENLLM_DEV_BUILD=True to include the generated wheels in the bundle.
Containerize your Bento
```
bentoml containerize <name:version>
```
BentoML offers a comprehensive set of options for deploying and hosting online ML services in production. To learn more, check out the Deploying a Bento guide.

🧩 Models and Dependencies

OpenLLM currently supports the following:

Model	CPU	GPU	Optional
flan-t5	✅	✅	`pip install openllm[flan-t5]`
dolly-v2	✅	✅	👾 (not needed)
chatglm	❌	✅	`pip install openllm[chatglm]`
starcoder	❌	✅	`pip install openllm[starcoder]`
falcon	❌	✅	`pip install openllm[falcon]`
stablelm	✅	✅	👾 (not needed)

NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to install dependencies to run all models. If one wishes to use any of the aforementioned models, make sure to install the optional dependencies mentioned above.

Runtime Implementations

Different LLMs may have multiple runtime implementations. For instance, they might use Pytorch (pt), Tensorflow (tf), or Flax (flax).

If you wish to specify a particular runtime for a model, you can do so by setting the OPENLLM_{MODEL_NAME}_FRAMEWORK={runtime} environment variable before running openllm start.

For example, if you want to use the Tensorflow (tf) implementation for the flan-t5 model, you can use the following command:

OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5

Integrating a New Model

OpenLLM encourages contributions by welcoming users to incorporate their custom LLMs into the ecosystem. Checkout Adding a New Model Guide to see how you can do it yourself.

⚙️ Integrations

OpenLLM is not just a standalone product; it's a building block designed to easily integrate with other powerful tools. We currently offer integration with BentoML and LangChain.

BentoML

OpenLLM models can be integrated as a Runner in your BentoML service. These runners has a generate method that takes a string as a prompt and returns a corresponding output string. This will allow you to plug and play any OpenLLM models with your existing ML workflow.

import bentoml
import openllm

model = "dolly-v2"

llm_config = openllm.AutoConfig.for_model(model)
llm_runner = openllm.Runner(model, llm_config=llm_config)

svc = bentoml.Service(
    name=f"llm-dolly-v2-service", runners=[llm_runner]
)

@svc.api(input=Text(), output=Text())
async def prompt(input_text: str) -> str:
    answer = await llm_runner.generate(input_text)
    return answer

LangChain (⏳Coming Soon!)

In future LangChain releases, you'll be able to effortlessly invoke OpenLLM models, like so:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(model_name='flan-t5')
llm("What is the difference between a duck and a goose?")

if you have an OpenLLM server deployed elsewhere, you can connect to it by specifying its URL:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
llm("What is the difference between a duck and a goose?")

🍇 Telemetry

OpenLLM collects usage data to enhance user experience and improve the product. We only report OpenLLM's internal API calls and ensure maximum privacy by excluding sensitive information. We will never collect user code, model data, or stack traces. For usage tracking, check out the code.

You can opt-out of usage tracking by using the --do-not-track CLI option:

openllm [command] --do-not-track

Or by setting environment variable OPENLLM_DO_NOT_TRACK=True:

export OPENLLM_DO_NOT_TRACK=True

👥 Community

Engage with like-minded individuals passionate about LLMs, AI, and more on our Discord!

OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy-to-use👉 Join our Slack community!

🎁 Contributing

We welcome contributions! If you're interested in enhancing OpenLLM's capabilities or have any questions, don't hesitate to reach out in our discord channel.

Checkout our Developer Guide if you wish to contribute to OpenLLM's codebase.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.0a12 pre-release

May 14, 2024

0.5.0a11 pre-release

May 12, 2024

0.5.0a10 pre-release

May 9, 2024

0.5.0a9 pre-release

May 9, 2024

0.5.0a8 pre-release

May 9, 2024

0.5.0a7 pre-release

May 9, 2024

0.5.0a6 pre-release

May 9, 2024

0.5.0a5 pre-release

May 8, 2024

0.5.0a4 pre-release

May 8, 2024

0.5.0a3 pre-release

Apr 2, 2024

0.5.0a2 pre-release

Apr 2, 2024

0.5.0a1 pre-release

Mar 21, 2024

0.5.0a0 pre-release

Mar 15, 2024

0.4.44

Feb 6, 2024

0.4.43

Feb 5, 2024

0.4.42

Feb 2, 2024

0.4.41

Dec 18, 2023

0.4.40

Dec 15, 2023

0.4.39

Dec 14, 2023

0.4.38

Dec 13, 2023

0.4.37

Dec 13, 2023

0.4.36

Dec 12, 2023

0.4.35

Dec 7, 2023

0.4.34

Nov 30, 2023

0.4.33

Nov 29, 2023

0.4.32

Nov 29, 2023

0.4.31

Nov 26, 2023

0.4.30

Nov 26, 2023

0.4.29

Nov 26, 2023

0.4.28

Nov 24, 2023

0.4.27

Nov 24, 2023

0.4.26

Nov 22, 2023

0.4.25

Nov 22, 2023

0.4.24

Nov 22, 2023

0.4.23

Nov 22, 2023

0.4.22

Nov 21, 2023

0.4.21

Nov 20, 2023

0.4.20

Nov 20, 2023

0.4.19

Nov 20, 2023

0.4.18

Nov 20, 2023

0.4.17

Nov 20, 2023

0.4.16

Nov 19, 2023

0.4.15

Nov 19, 2023

0.4.14

Nov 17, 2023

0.4.13

Nov 17, 2023

0.4.12

Nov 17, 2023

0.4.11

Nov 17, 2023

0.4.10

Nov 17, 2023

0.4.9

Nov 15, 2023

0.4.8

Nov 15, 2023

0.4.7

Nov 15, 2023

0.4.6

Nov 14, 2023

0.4.5

Nov 13, 2023

0.4.4

Nov 12, 2023

0.4.3

Nov 12, 2023

0.4.2

Nov 12, 2023

0.4.1

Nov 8, 2023

0.4.0

Nov 7, 2023

0.3.14

Nov 4, 2023

0.3.13

Oct 31, 2023

0.3.12

Oct 30, 2023

0.3.10

Oct 30, 2023

0.3.9

Oct 17, 2023

0.3.8

Oct 16, 2023

0.3.7

Oct 12, 2023

0.3.6

Sep 19, 2023

0.3.5

Sep 18, 2023

0.3.4

Sep 14, 2023

0.3.3

Sep 7, 2023

0.3.2

Sep 6, 2023

0.3.1

Sep 6, 2023

0.3.0

Sep 4, 2023

0.2.27

Aug 25, 2023

0.2.26

Aug 17, 2023

0.2.25

Aug 16, 2023

0.2.24

Aug 15, 2023

0.2.23

Aug 15, 2023

0.2.22

Aug 11, 2023

0.2.21 yanked

Aug 11, 2023

Reason this release was yanked:

broken client

0.2.20

Aug 10, 2023

0.2.19 yanked

Aug 10, 2023

Reason this release was yanked:

broken imports from compiled init

0.2.18

Aug 9, 2023

0.2.17

Aug 8, 2023

0.2.16

Aug 4, 2023

0.2.15 yanked

Aug 4, 2023

Reason this release was yanked:

include a regression with vllm

0.2.14 yanked

Aug 4, 2023

Reason this release was yanked:

include a regression with vllm

0.2.13

Aug 3, 2023

0.2.12

Aug 1, 2023

0.2.11

Jul 28, 2023

0.2.10

Jul 25, 2023

0.2.9

Jul 24, 2023

0.2.8

Jul 24, 2023

0.2.7

Jul 23, 2023

0.2.6

Jul 22, 2023

0.2.5

Jul 21, 2023

0.2.4

Jul 21, 2023

0.2.3

Jul 21, 2023

0.2.2

Jul 21, 2023

0.2.1 yanked

Jul 20, 2023

Reason this release was yanked:

Broken installation with openllm[llama]

0.2.0

Jul 20, 2023

0.1.20

Jul 5, 2023

0.1.19

Jun 29, 2023

0.1.18

Jun 29, 2023

0.1.17

Jun 27, 2023

0.1.16

Jun 27, 2023

0.1.15

Jun 26, 2023

0.1.14

Jun 25, 2023

0.1.13

Jun 24, 2023

0.1.12

Jun 24, 2023

0.1.11

Jun 23, 2023

0.1.10

Jun 21, 2023

0.1.9

Jun 21, 2023

0.1.8

Jun 19, 2023

0.1.7

Jun 19, 2023

0.1.6

Jun 17, 2023

0.1.5

Jun 15, 2023

0.1.4

Jun 14, 2023

0.1.3

Jun 14, 2023

0.1.2

Jun 13, 2023

0.1.1

Jun 12, 2023

0.1.0

Jun 12, 2023

0.0.34

Jun 11, 2023

0.0.33

Jun 10, 2023

0.0.32

Jun 9, 2023

0.0.31

Jun 8, 2023

This version

0.0.30

Jun 8, 2023

0.0.29

Jun 8, 2023

0.0.28

Jun 8, 2023

0.0.27

Jun 8, 2023

0.0.26

Jun 7, 2023

0.0.25

Jun 6, 2023

0.0.24

Jun 6, 2023

0.0.23

Jun 6, 2023

0.0.22

Jun 6, 2023

0.0.21

Jun 4, 2023

0.0.19

Jun 4, 2023

0.0.18

Jun 4, 2023

0.0.17

Jun 4, 2023

0.0.16

Jun 2, 2023

0.0.15

Jun 1, 2023

0.0.14

May 31, 2023

0.0.13

May 30, 2023

0.0.12

May 30, 2023

0.0.11

May 30, 2023

0.0.10

May 29, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openllm-0.0.30.tar.gz (88.7 kB view hashes)

Uploaded Jun 8, 2023 Source

Built Distribution

openllm-0.0.30-py3-none-any.whl (119.6 kB view hashes)

Uploaded Jun 8, 2023 Python 3

Hashes for openllm-0.0.30.tar.gz

Hashes for openllm-0.0.30.tar.gz
Algorithm	Hash digest
SHA256	`90b83c3398a9ee19df1959a070ac077b3c9854989f712c0588a7e19097bfe8cd`
MD5	`91ca595cae71329c9e990aa11caf2a9d`
BLAKE2b-256	`6fa141df5d1146ddb4304eaa2c18a5f4c725a15f31c3cbb4331c889404f1a207`

Hashes for openllm-0.0.30-py3-none-any.whl

Hashes for openllm-0.0.30-py3-none-any.whl
Algorithm	Hash digest
SHA256	`25634364799f850b289431da032bf854859757d4b44b8932f03475f881acb6bd`
MD5	`7b3ad637473a7b92506c0578e4875592`
BLAKE2b-256	`ccfde37fba95fa1bffe162744a9c579b31e21dbf03f3db8f7df262ea686a5c33`