A streaming chat toolkit for pre-trained large language models(LLM)

These details have not been verified by PyPI

Project links

Homepage

Project description

ChatStream

ChatStream is a chat toolkit for pre-trained large language models.

It can be embedded in FastAPI/Starlette based web applications/web APIs to perform sequential sentence generation with pre-trained language models under load control.

Installation

pip install chatstream

Quick Start

Install required packages

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install transformers
pip install "uvicorn[standard]" gunicorn

Implementing a ChatStream server

Implement a streaming chat server for pre-trained models.

import torch
from fastapi import FastAPI, Request
from fastsession import FastSessionMiddleware, MemoryStore
from transformers import AutoTokenizer, AutoModelForCausalLM

from chatstream import ChatStream, ChatPromptTogetherRedPajamaINCITEChat as ChatPrompt

model_path = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"
device = "cuda"  # "cuda" / "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
model.to(device)

chat_stream = ChatStream(
    num_of_concurrent_executions=2,  # max_concurrent_executions for sentence generation
    max_queue_size=5,  # size of queue
    model=model,
    tokenizer=tokenizer,
    device=device,
    chat_prompt_clazz=ChatPrompt,
)

app = FastAPI()

# Specify session middleware to keep per-user ChatPrompt in the HTTP session
app.add_middleware(FastSessionMiddleware,
                   secret_key="your-session-secret-key",
                   store=MemoryStore(),
                   http_only=True,
                   secure=False,
                   )


@app.post("/chat_stream")
async def stream_api(request: Request):
    # Just pass a FastAPI Request object to `handle_chat_stream_request` to automatically queue and control concurrency
    response = await chat_stream.handle_chat_stream_request(request)
    return response


@app.on_event("startup")
async def startup():
    # start the queueing system by doing `start_queue_worker` at the same time the web server starts up
    await chat_stream.start_queue_worker()

What is ChatStream
- Importing Prompt Class ChatPrompt
- Loading model classes
-- HTTP session middleware configuration
- Create and initialize ChatStream
Implementation of Web API Endpoints
Queueing System and Concurrency Limit
- - What is the queueing system
- - Starting the Queueing System
Start the Web server (ASGI server)
Console chat implementation
- - Run a simple console chat to check the model behavior
Configuration during development
Advanced Settings
- Chat History Persistence
  - - Implement custom request handler
- Configuration for large scale access
  - Interfacing with login authentication using OAuth
  - - Load Balancing on Multi-GPU
  - - Load Balancing with Multi-GPU Server

License

LICENSE.md

Citing ChatStream

@software{chatstream,
  title = {{ChatStream: A streaming chat toolkit for pre-trained large language models(LLM)}},
  author = {Qualiteg Inc.(https://qualiteg.com) },
  url = {https://github.com/qualiteg/ChatStream}
  month = {5},
  year = {2023},
  version = {0.15},
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Sep 4, 2023

0.2.0

Jun 4, 2023

0.1.5

May 31, 2023

0.1.4

May 26, 2023

0.1.3

May 22, 2023

0.1.2

May 22, 2023

0.1.1

May 22, 2023

0.1.0

May 22, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatstream-0.3.0.tar.gz (75.7 kB view details)

Uploaded Sep 4, 2023 Source

Built Distribution

chatstream-0.3.0-py3-none-any.whl (116.7 kB view details)

Uploaded Sep 4, 2023 Python 3

File details

Details for the file chatstream-0.3.0.tar.gz.

File metadata

Download URL: chatstream-0.3.0.tar.gz
Upload date: Sep 4, 2023
Size: 75.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for chatstream-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ec56ec1a1d836d3136c2d78a4ff4fdb744033c6b6cc8a9a5080b02f984adec6b`
MD5	`d6d16e7f1a9fb7f56a3ea3d30aab434e`
BLAKE2b-256	`ed3f2c6438bd99a8c43fae65a8503d942f67b523382b5dc255cb919d95609a0a`

See more details on using hashes here.

File details

Details for the file chatstream-0.3.0-py3-none-any.whl.

File metadata

Download URL: chatstream-0.3.0-py3-none-any.whl
Upload date: Sep 4, 2023
Size: 116.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for chatstream-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddf90126b4d2627bd3ac91939b76192db4b9db75fb7dd1b6b2d5c23b597d79d4`
MD5	`51758ed9ba5223a8ea4804d6d080d668`
BLAKE2b-256	`51760c77b1235255911aafa2f2a98035ecf7c9df1a6cb080707e9d07342ff7db`