Skip to main content

A streaming chat toolkit for pre-trained large language models(LLM)

Project description

ChatStream

English | 日本語

ChatStream is a chat toolkit for pre-trained large language models.

It can be embedded in FastAPI/Starlette based web applications/web APIs to perform sequential sentence generation with pre-trained language models under load control.

Installation

pip install chatstream

Quick Start

Install required packages

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install transformers
pip install "uvicorn[standard]" gunicorn 

Implementing a ChatStream server

Implement a streaming chat server for pre-trained models.

import torch
from fastapi import FastAPI, Request
from fastsession import FastSessionMiddleware, MemoryStore
from transformers import AutoTokenizer, AutoModelForCausalLM

from chatstream import ChatStream, ChatPromptTogetherRedPajamaINCITEChat as ChatPrompt

model_path = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"
device = "cuda"  # "cuda" / "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
model.to(device)

chat_stream = ChatStream(
    num_of_concurrent_executions=2,  # max_concurrent_executions for sentence generation
    max_queue_size=5,  # size of queue
    model=model,
    tokenizer=tokenizer,
    device=device,
    chat_prompt_clazz=ChatPrompt,
)

app = FastAPI()

# Specify session middleware to keep per-user ChatPrompt in the HTTP session
app.add_middleware(FastSessionMiddleware,
                   secret_key="your-session-secret-key",
                   store=MemoryStore(),
                   http_only=True,
                   secure=False,
                   )


@app.post("/chat_stream")
async def stream_api(request: Request):
    # Just pass a FastAPI Request object to `handle_chat_stream_request` to automatically queue and control concurrency
    response = await chat_stream.handle_chat_stream_request(request)
    return response


@app.on_event("startup")
async def startup():
    # start the queueing system by doing `start_queue_worker` at the same time the web server starts up
    await chat_stream.start_queue_worker()

Table of Contents

License

LICENSE.md

Citing ChatStream

@software{chatstream,
  title = {{ChatStream: A streaming chat toolkit for pre-trained large language models(LLM)}},
  author = {Qualiteg Inc.(https://qualiteg.com) },
  url = {https://github.com/qualiteg/ChatStream}
  month = {5},
  year = {2023},
  version = {0.15},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatstream-0.3.0.tar.gz (75.7 kB view details)

Uploaded Source

Built Distribution

chatstream-0.3.0-py3-none-any.whl (116.7 kB view details)

Uploaded Python 3

File details

Details for the file chatstream-0.3.0.tar.gz.

File metadata

  • Download URL: chatstream-0.3.0.tar.gz
  • Upload date:
  • Size: 75.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for chatstream-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ec56ec1a1d836d3136c2d78a4ff4fdb744033c6b6cc8a9a5080b02f984adec6b
MD5 d6d16e7f1a9fb7f56a3ea3d30aab434e
BLAKE2b-256 ed3f2c6438bd99a8c43fae65a8503d942f67b523382b5dc255cb919d95609a0a

See more details on using hashes here.

File details

Details for the file chatstream-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: chatstream-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 116.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for chatstream-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddf90126b4d2627bd3ac91939b76192db4b9db75fb7dd1b6b2d5c23b597d79d4
MD5 51758ed9ba5223a8ea4804d6d080d668
BLAKE2b-256 51760c77b1235255911aafa2f2a98035ecf7c9df1a6cb080707e9d07342ff7db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page