Skip to main content

OpenLLM: REST/gRPC API server for running any open Large-Language Model - StableLM, Llama, Alpaca, Dolly, Flan-T5, Custom

Project description

OpenLLM

pypi_status ci Discord Twitter
Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.
Powered by BentoML ๐Ÿฑ

๐Ÿ“– Introduction

With OpenLLM, you can easily run inference with any open-source large-language models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are some key features:

๐Ÿš‚ SOTA LLMs: With a single click, access support for state-of-the-art LLMs, including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.

๐Ÿ”ฅ Easy-to-use APIs: We provide intuitive interfaces by integrating with popular tools like BentoML, HuggingFace, LangChain, and more.

๐Ÿ“ฆ Fine-tuning your own LLM: Customize any LLM to suit your needs with LLM.tuning(). (Work In Progress)

โ›“๏ธ Interoperability: First-class support for LangChain and BentoMLโ€™s runner architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In Progress)

๐ŸŽฏ Streamline Production Deployment: Seamlessly package into a Bento with openllm build, containerized into OCI Images, and deploy with a single click using โ˜๏ธ BentoCloud.

๐Ÿƒโ€ Getting Started

To use OpenLLM, you need to have Python 3.8 (or newer) and pip installed on your system. We highly recommend using a Virtual Environment to prevent package conflicts.

You can install OpenLLM using pip as follows:

pip install openllm

To verify if it's installed correctly, run:

openllm -h

The correct output will be:

Usage: openllm [OPTIONS] COMMAND [ARGS]...

   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—     โ–ˆโ–ˆโ•—     โ–ˆโ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ–ˆโ•—
  โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ•‘
  โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•‘
  โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ• โ–ˆโ–ˆโ•”โ•โ•โ•  โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘
  โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘ โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘ โ•šโ•โ• โ–ˆโ–ˆโ•‘
   โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•     โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•  โ•šโ•โ•โ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•     โ•šโ•โ•

  OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model

      - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more

      - Powered by BentoML ๐Ÿฑ

Starting an LLM Server

To start an LLM server, use openllm start. For example, to start a dolly-v2 server:

openllm start dolly-v2

Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you can experiment with the endpoints and sample prompts.

OpenLLM provides a built-in Python client, allowing you to interact with the model. In a different terminal window or a Jupyter notebook, create a client to start interacting with the model:

>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')

You can also use the openllm query command to query the model from the terminal:

openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"'

๐Ÿš€ Deploying to Production

To deploy your LLMs into production:

  1. Building a Bento: With OpenLLM, you can easily build a Bento for a specific model, like dolly-v2, using the build command.:

    openllm build dolly-v2
    

    A Bento, in BentoML, is the unit of distribution. It packages your program's source code, models, files, artifacts, and dependencies.

    NOTE: If you wish to build OpenLLM from the git source, set OPENLLM_DEV_BUILD=True to include the generated wheels in the bundle.

  2. Containerize your Bento

    bentoml containerize <name:version>
    

    BentoML offers a comprehensive set of options for deploying and hosting online ML services in production. To learn more, check out the Deploying a Bento guide.

๐Ÿงฉ Models and Dependencies

OpenLLM currently supports the following:

Model CPU GPU Optional
flan-t5 โœ… โœ… pip install openllm[flan-t5]
dolly-v2 โœ… โœ… ๐Ÿ‘พ (not needed)
chatglm โŒ โœ… pip install openllm[chatglm]
starcoder โŒ โœ… pip install openllm[starcoder]
falcon โŒ โœ… pip install openllm[falcon]
stablelm โœ… โœ… ๐Ÿ‘พ (not needed)

NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to install dependencies to run all models. If one wishes to use any of the aforementioned models, make sure to install the optional dependencies mentioned above.

Runtime Implementations

Different LLMs may have multiple runtime implementations. For instance, they might use Pytorch (pt), Tensorflow (tf), or Flax (flax).

If you wish to specify a particular runtime for a model, you can do so by setting the OPENLLM_{MODEL_NAME}_FRAMEWORK={runtime} environment variable before running openllm start.

For example, if you want to use the Tensorflow (tf) implementation for the flan-t5 model, you can use the following command:

OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5

Integrating a New Model

OpenLLM encourages contributions by welcoming users to incorporate their custom LLMs into the ecosystem. Checkout Adding a New Model Guide to see how you can do it yourself.

โš™๏ธ Integrations

OpenLLM is not just a standalone product; it's a building block designed to easily integrate with other powerful tools. We currently offer integration with BentoML and LangChain.

BentoML

OpenLLM models can be integrated as a Runner in your BentoML service. These runners has a generate method that takes a string as a prompt and returns a corresponding output string. This will allow you to plug and play any OpenLLM models with your existing ML workflow.

import bentoml
import openllm

model = "dolly-v2"

llm_config = openllm.AutoConfig.for_model(model)
llm_runner = openllm.Runner(model, llm_config=llm_config)

svc = bentoml.Service(
    name=f"llm-dolly-v2-service", runners=[llm_runner]
)

@svc.api(input=Text(), output=Text())
async def prompt(input_text: str) -> str:
    answer = await llm_runner.generate(input_text)
    return answer

LangChain (โณComing Soon!)

In future LangChain releases, you'll be able to effortlessly invoke OpenLLM models, like so:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(model_name='flan-t5')
llm("What is the difference between a duck and a goose?")

if you have an OpenLLM server deployed elsewhere, you can connect to it by specifying its URL:

from langchain.llms import OpenLLM
llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
llm("What is the difference between a duck and a goose?")

๐Ÿ‡ Telemetry

OpenLLM collects usage data to enhance user experience and improve the product. We only report OpenLLM's internal API calls and ensure maximum privacy by excluding sensitive information. We will never collect user code, model data, or stack traces. For usage tracking, check out the code.

You can opt-out of usage tracking by using the --do-not-track CLI option:

openllm [command] --do-not-track

Or by setting environment variable OPENLLM_DO_NOT_TRACK=True:

export OPENLLM_DO_NOT_TRACK=True

๐Ÿ‘ฅ Community

Engage with like-minded individuals passionate about LLMs, AI, and more on our Discord!

OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy-to-use๐Ÿ‘‰ Join our Slack community!

๐ŸŽ Contributing

We welcome contributions! If you're interested in enhancing OpenLLM's capabilities or have any questions, don't hesitate to reach out in our discord channel.

Checkout our Developer Guide if you wish to contribute to OpenLLM's codebase.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openllm-0.0.30.tar.gz (88.7 kB view hashes)

Uploaded Source

Built Distribution

openllm-0.0.30-py3-none-any.whl (119.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page