A streaming chat toolkit for pre-trained large language models(LLM)
Project description
ChatStream
A streaming chat toolkit for pre-trained large language models(LLM)
Installation
pip install chatstream
What can it do?
1. Simplifies the construction of streaming chats
You can easily construct streaming chats with large pre-trained language models based on HuggingFace.
What is a Streaming Chat
When generating sentences with a large language model, there are methods that generate the next sentence entirely based on the inputted prompt (and the conversation history so far) before outputting it, and methods that sequentially output the next sentence one token at a time. The latter method is specifically referred to as 'streaming'. In this package, token generation is done one token at a time, and it is streamed as a response to the client. This contributes to a better user experience compared to waiting for the entire sentence to be generated.
2. Automatically maintains conversation history and context
By default, the HTTP session feature keeps the conversation history between the user and the language model in on-memory on the server-side. The session duration can be set, but it is basically while the browser is open. This makes it possible to have multi-round web chats with continued context.
3. Control of concurrent access from multiple users
It is designed with concurrent access from multiple clients in mind, and is controlled according to the following parameters specified in the constructor
num_of_concurrent_executions
: int ... The number of concurrent sentence generation tasks to the pre-trained language
model
max_queue_size: int
... The size of the queue for sentence generation. When the number of concurrent sentence
generation tasks falls below the limit
How to use
The following is an example of streaming chat using togethercomputer/RedPajama-INCITE-Chat-3B-v1
as a model
Create a chat_prompt class (chat_prompt_for_redpajama_incite.py) that generates prompts for the model and a server.py that acts as a streaming server.
Required Packages for the Demo Code
- Pytorch
If you're using CUDA, please install the CUDA-compatible version of pytorch.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
- HuggingFace Transformer Library
pip install transformers
- Web server
uvicorn or gunicorn
pip install "uvicorn[standard]" gunicorn
chat_prompt_for_redpajama_incite.py
from chatstream.chat_prompt import AbstractChatPrompt
class ChatPromptRedpajamaIncite(AbstractChatPrompt):
def __init__(self):
super().__init__() # Call the initialization of the base class
self.set_requester("<human>")
self.set_responder("<bot>")
def get_stop_strs(self):
"""
returns special stop strings for model to finsh generating sentence
:return:
"""
if not self.chat_mode:
return None
return [
'<|endoftext|>',
'\n<'
# Safety stop valve when the model generates not only AI conversations but also human parts of the conversation.
]
def create_prompt(self):
"""
Build prompts according to the characteristics of each language model
:return:
"""
if self.chat_mode == False:
return self.get_requester_last_msg()
ret = self.system;
for chat_content in self.chat_contents:
chat_content_role = chat_content.get_role()
chat_content_message = chat_content.get_message()
if chat_content_role:
if chat_content_message:
merged_message = chat_content_role + ": " + chat_content_message + "\n"
else:
merged_message = chat_content_role + ":"
ret += merged_message
return ret
# portable UT
if False:
chatPrompt = ChatPrompt()
chatPrompt.set_requester("<human>")
chatPrompt.set_responder("<bot>")
chatPrompt.add_requester_msg("Who is Alan Turing")
chatPrompt.add_responder_msg(None)
assert """<human>: Who is Alan Turing
<bot>:""" == chatPrompt.create_prompt()
server.py
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import uvicorn
from fastapi import FastAPI, Request
from fastapi.staticfiles import StaticFiles
from fastsession import FastSessionMiddleware, MemoryStore
from chat_prompt_for_redpajama_incite import ChatPromptRedpajamaIncite as ChatPrompt
from chatstream import ChatStream
MAX_CONCURRENT_CONNECTIONS = 2
MAX_QUEUE_SIZE = 5
model_path = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"
device = "cuda" # "cuda" / "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
if device == "cuda":
model.to(device)
chat_stream = ChatStream(
num_of_concurrent_executions=MAX_CONCURRENT_CONNECTIONS,
max_queue_size=MAX_QUEUE_SIZE,
model=model,
tokenizer=tokenizer,
device=device,
chat_prompt_clazz=ChatPrompt,
)
app = FastAPI()
# Add session middleware to keep context
app.add_middleware(FastSessionMiddleware,
secret_key="your-session-secret-key", # Key for cookie signature
store=MemoryStore(), # Store for session saving
http_only=True, # True: Cookie cannot be accessed from client-side scripts such as JavaScript
secure=False, # False: For local development env. True: For production. Requires Https
)
@app.post("/chat_stream")
async def stream_api(request: Request):
response = await chat_stream.handle_starlette_request(request)
return response
@app.get("/stats")
async def stats_api():
return chat_stream.get_stats()
# for absolute URL of html contained dir
def get_html_dir():
return os.path.join(os.path.dirname(os.path.realpath(__file__)), 'html')
app.mount("/", StaticFiles(directory=get_html_dir(), html=True), name="html")
@app.on_event("startup")
async def startup():
await chat_stream.start_queue_worker()
def start_server():
uvicorn.run(app, host='localhost', port=18080)
def main():
start_server()
if __name__ == "__main__":
main()
License
All code in this repository was developed by Tom Misawa except where otherwise noted. Copyright (c) 2023, Tom Misawa. All rights reserved. The code is licensed under the Apache 2.0 license.
Copyright 2023 Tom Misawa(riversun.org@gmail.com)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Citing ChatStream
@software{chatstream,
title = {{ChatStream: A streaming chat toolkit for pre-trained large language models(LLM)}},
author = {Tom Misawa(riversun.org@gmail.com) },
url = {https://github.com/riversun/ChatStream}
month = {5},
year = {2023},
version = {0.15},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chatstream-0.1.5.tar.gz
.
File metadata
- Download URL: chatstream-0.1.5.tar.gz
- Upload date:
- Size: 31.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a58a4811f8205cac1dfe1f99a33c9b45bc59694d0878a3097ce6e975152a5a86 |
|
MD5 | a5e00e8bf502afe447e9d63cd16d302f |
|
BLAKE2b-256 | 61ec4b9241d2d25715cd963a120b9f730b32ef7c5507a8ad6efce8102db9c970 |
File details
Details for the file chatstream-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: chatstream-0.1.5-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8eee64c0ac26238bdddc5cc15b449aea29f36de5cce04b4c23e007bde0c8f1d1 |
|
MD5 | 6ed7338331232cc3d9412622da6bef81 |
|
BLAKE2b-256 | 4fa8a7f1dd5aefc9e670923107e4b01a733510fd6a97af28877eb02f867e00d2 |