automated huggingchat openai style fastapi inference
Project description
LiteChat 🚀
LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.
Features ✨
- 🔄 OpenAI API compatibility
- 🌐 Web search integration
- 💬 Conversation memory
- 🔄 Streaming responses
- 🛠️ Easy integration with HuggingFace models
- 📦 Compatible with both litellm and OpenAI clients
- 🎯 Type-safe model selection
Installation 🛠️
pip install litechat playwright
playwright install
Available Models 🤖
LiteChat supports the following models:
Qwen/Qwen2.5-Coder-32B-Instruct: Specialized coding modelQwen/Qwen2.5-72B-Instruct: Large general-purpose modelmeta-llama/Llama-3.3-70B-Instruct: Latest Llama 3 modelCohereForAI/c4ai-command-r-plus-08-2024: Cohere's command modelQwen/QwQ-32B-Preview: Preview version of QwQnvidia/Llama-3.1-Nemotron-70B-Instruct-HF: NVIDIA's Nemotron modelmeta-llama/Llama-3.2-11B-Vision-Instruct: Vision-capable Llama modelNousResearch/Hermes-3-Llama-3.1-8B: Lightweight Hermes modelmistralai/Mistral-Nemo-Instruct-2407: Mistral's instruction modelmicrosoft/Phi-3.5-mini-instruct: Microsoft's compact Phi model
Model Selection Helpers 🎯
LiteChat provides helper functions for type-safe model selection:
from litechat import litechat_model, litellm_model
# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")
# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct") # Returns "openai/Qwen/Qwen2.5-72B-Instruct"
Quick Start 🚀
Starting the Server
You can start the LiteChat server in two ways:
- Using the CLI:
litechat_server
- Programmatically:
from litechat import litechat_server
if __name__ == "__main__":
litechat_server(host="0.0.0.0", port=11437)
Using with OpenAI Client
import os
from openai import OpenAI
os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used
client = OpenAI()
response = client.chat.completions.create(
model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Using with LiteLLM
import os
from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model
os.environ["OPENAI_API_KEY"] = "key123"
response = completion(
model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
messages=[{"content": "Hello, how are you?", "role": "user"}],
api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)
Using LiteChat's Native Client
from litechat import completion, genai, pp_completion
from litechat import litechat_model
# Basic completion
response = completion(
prompt="What is quantum computing?",
model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
web_search=True # Enable web search
)
# Stream with pretty printing
pp_completion(
prompt="Explain the theory of relativity",
model="Qwen/Qwen2.5-72B-Instruct",
conversation_id="physics_chat" # Enable conversation memory
)
# Get direct response
result = genai(
prompt="Write a poem about spring",
model="meta-llama/Llama-3.3-70B-Instruct",
system_prompt="You are a creative poet"
)
Advanced Features 🔧
Web Search Integration
Enable web search to get up-to-date information:
response = completion(
prompt="What are the latest developments in AI?",
web_search=True
)
Conversation Memory
Maintain context across multiple interactions:
response = completion(
prompt="Tell me more about that",
conversation_id="unique_conversation_id"
)
Streaming Responses
Get token-by-token streaming:
for chunk in completion(
prompt="Write a long story",
stream=True
):
print(chunk.choices[0].delta.content, end="", flush=True)
API Reference 📚
LiteAI Client
from litechat import LiteAI, litechat_model
client = LiteAI(
api_key="optional-key", # Optional API key
base_url="http://localhost:11437", # Server URL
system_prompt="You are a helpful assistant", # Default system prompt
web_search=False, # Enable/disable web search by default
model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF") # Default model
)
Completion Function Parameters
messages: List of conversation messages or direct prompt stringmodel: HuggingFace model identifier (uselitechat_model()for type safety)system_prompt: System instruction for the modeltemperature: Control randomness (0.0 to 1.0)stream: Enable streaming responsesweb_search: Enable web searchconversation_id: Enable conversation memorymax_tokens: Maximum tokens in responsetools: List of available tools/functions
Contributing 🤝
Contributions are welcome! Please feel free to submit a Pull Request.
License 📄
This project is licensed under the MIT License - see the LICENSE file for details.
Support 💬
For support, please open an issue on the GitHub repository or reach out to the maintainers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litechat-0.0.54.tar.gz.
File metadata
- Download URL: litechat-0.0.54.tar.gz
- Upload date:
- Size: 72.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-51-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ce6491b07fa5a105da0d5cf93749a1eed541f1c1c782609073443db4681d931
|
|
| MD5 |
c7edc2286347c007931f7afc4ea86c49
|
|
| BLAKE2b-256 |
1c88c8eb3139cdf7e2a5370cc59beae5e1c78805e9a78fe26f40fe19c558786f
|
File details
Details for the file litechat-0.0.54-py3-none-any.whl.
File metadata
- Download URL: litechat-0.0.54-py3-none-any.whl
- Upload date:
- Size: 35.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-51-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6423d2348c7ba52e01125bfd2c362429cd19b2f77e6119dc32f94fff67f4efe8
|
|
| MD5 |
8f87738ff7baae3da02006f28cdcb1ac
|
|
| BLAKE2b-256 |
9dda6478ee8f068e96297780db63f9cbc6b4afa8defa592fbb2f610dba8dcf75
|