automated huggingchat openai style fastapi inference

These details have not been verified by PyPI

Project links

Project description

LiteChat 🚀

LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.

Features ✨

🔄 OpenAI API compatibility
🌐 Web search integration
💬 Conversation memory
🔄 Streaming responses
🛠️ Easy integration with HuggingFace models
📦 Compatible with both litellm and OpenAI clients
🎯 Type-safe model selection

Installation 🛠️

pip install litechat playwright
playwright install

Available Models 🤖

LiteChat supports the following models:

Qwen/Qwen2.5-Coder-32B-Instruct: Specialized coding model
Qwen/Qwen2.5-72B-Instruct: Large general-purpose model
meta-llama/Llama-3.3-70B-Instruct: Latest Llama 3 model
CohereForAI/c4ai-command-r-plus-08-2024: Cohere's command model
Qwen/QwQ-32B-Preview: Preview version of QwQ
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF: NVIDIA's Nemotron model
meta-llama/Llama-3.2-11B-Vision-Instruct: Vision-capable Llama model
NousResearch/Hermes-3-Llama-3.1-8B: Lightweight Hermes model
mistralai/Mistral-Nemo-Instruct-2407: Mistral's instruction model
microsoft/Phi-3.5-mini-instruct: Microsoft's compact Phi model

Model Selection Helpers 🎯

LiteChat provides helper functions for type-safe model selection:

from litechat import litechat_model, litellm_model

# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")

# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct")  # Returns "openai/Qwen/Qwen2.5-72B-Instruct"

Quick Start 🚀

Starting the Server

You can start the LiteChat server in two ways:

Using the CLI:

litechat_server

Programmatically:

from litechat import litechat_server

if __name__ == "__main__":
    litechat_server(host="0.0.0.0", port=11437)

Using with OpenAI Client

import os
from openai import OpenAI

os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used

client = OpenAI()
response = client.chat.completions.create(
    model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
print(response.choices[0].message.content)

Using with LiteLLM

import os

from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model

os.environ["OPENAI_API_KEY"] = "key123"

response = completion(
    model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)

Using LiteChat's Native Client

from litechat import completion, genai, pp_completion
from litechat import litechat_model

# Basic completion
response = completion(
    prompt="What is quantum computing?",
    model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
    web_search=True  # Enable web search
)

# Stream with pretty printing
pp_completion(
    prompt="Explain the theory of relativity",
    model="Qwen/Qwen2.5-72B-Instruct",
    conversation_id="physics_chat"  # Enable conversation memory
)

# Get direct response
result = genai(
    prompt="Write a poem about spring",
    model="meta-llama/Llama-3.3-70B-Instruct",
    system_prompt="You are a creative poet"
)

Advanced Features 🔧

Web Search Integration

Enable web search to get up-to-date information:

response = completion(
    prompt="What are the latest developments in AI?",
    web_search=True
)

Conversation Memory

Maintain context across multiple interactions:

response = completion(
    prompt="Tell me more about that",
    conversation_id="unique_conversation_id"
)

Streaming Responses

Get token-by-token streaming:

for chunk in completion(
    prompt="Write a long story",
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)

API Reference 📚

LiteAI Client

from litechat import LiteAI, litechat_model

client = LiteAI(
    api_key="optional-key",  # Optional API key
    base_url="http://localhost:11437",  # Server URL
    system_prompt="You are a helpful assistant",  # Default system prompt
    web_search=False,  # Enable/disable web search by default
    model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF")  # Default model
)

Completion Function Parameters

messages: List of conversation messages or direct prompt string
model: HuggingFace model identifier (use litechat_model() for type safety)
system_prompt: System instruction for the model
temperature: Control randomness (0.0 to 1.0)
stream: Enable streaming responses
web_search: Enable web search
conversation_id: Enable conversation memory
max_tokens: Maximum tokens in response
tools: List of available tools/functions

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Support 💬

For support, please open an issue on the GitHub repository or reach out to the maintainers.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.69

Feb 2, 2025

0.0.68

Feb 1, 2025

0.0.67

Feb 1, 2025

0.0.66

Feb 1, 2025

0.0.65

Feb 1, 2025

0.0.64

Feb 1, 2025

0.0.63

Feb 1, 2025

0.0.62

Feb 1, 2025

0.0.61

Feb 1, 2025

0.0.60

Jan 29, 2025

0.0.59

Jan 24, 2025

0.0.58

Jan 18, 2025

0.0.57

Jan 18, 2025

0.0.56

Jan 18, 2025

0.0.55

Jan 17, 2025

0.0.54

Jan 14, 2025

0.0.53

Jan 14, 2025

0.0.52

Jan 14, 2025

0.0.51

Jan 13, 2025

0.0.50

Jan 11, 2025

0.0.49

Jan 11, 2025

0.0.47

Jan 10, 2025

0.0.46

Jan 10, 2025

0.0.45

Jan 10, 2025

0.0.44

Jan 10, 2025

0.0.43

Jan 10, 2025

0.0.42

Jan 10, 2025

0.0.41

Jan 9, 2025

This version

0.0.40

Jan 8, 2025

0.0.39

Jan 8, 2025

0.0.38

Jan 8, 2025

0.0.37

Jan 8, 2025

0.0.36

Jan 7, 2025

0.0.35

Jan 7, 2025

0.0.34

Jan 6, 2025

0.0.33

Jan 6, 2025

0.0.32

Jan 6, 2025

0.0.31

Jan 6, 2025

0.0.30

Jan 6, 2025

0.0.29

Jan 6, 2025

0.0.28

Jan 6, 2025

0.0.27

Jan 6, 2025

0.0.26

Jan 6, 2025

0.0.25

Jan 6, 2025

0.0.24

Jan 6, 2025

0.0.23

Jan 6, 2025

0.0.22

Jan 6, 2025

0.0.21

Jan 5, 2025

0.0.16

Jan 5, 2025

0.0.15

Jan 5, 2025

0.0.14

Jan 5, 2025

0.0.13

Jan 5, 2025

0.0.12

Jan 5, 2025

0.0.11

Jan 4, 2025

0.0.10

Jan 4, 2025

0.0.9

Jan 4, 2025

0.0.8

Jan 4, 2025

0.0.7

Jan 4, 2025

0.0.6

Jan 4, 2025

0.0.5

Jan 4, 2025

0.0.4

Jan 4, 2025

0.0.3

Jan 4, 2025

0.0.2

Jan 3, 2025

0.0.1

Jan 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litechat-0.0.40.tar.gz (67.5 kB view details)

Uploaded Jan 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litechat-0.0.40-py3-none-any.whl (30.6 kB view details)

Uploaded Jan 8, 2025 Python 3

File details

Details for the file litechat-0.0.40.tar.gz.

File metadata

Download URL: litechat-0.0.40.tar.gz
Upload date: Jan 8, 2025
Size: 67.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-49-generic

File hashes

Hashes for litechat-0.0.40.tar.gz
Algorithm	Hash digest
SHA256	`1062092af2778daab1246d9db9dd8161da560bc2c690a402f49d35e61b811c3e`
MD5	`e3415c2f25be8e80403af0739ff69021`
BLAKE2b-256	`bd396178e19aaee26dd6618995abb68ffdbd21d5b27b1dc7a52f4d664a6a35d4`

See more details on using hashes here.

File details

Details for the file litechat-0.0.40-py3-none-any.whl.

File metadata

Download URL: litechat-0.0.40-py3-none-any.whl
Upload date: Jan 8, 2025
Size: 30.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-49-generic

File hashes

Hashes for litechat-0.0.40-py3-none-any.whl
Algorithm	Hash digest
SHA256	`da2a5575c1b6238397c8f49b60cf9ed2e138c1bd88831faaff61c2183775ac50`
MD5	`b33c2abd957814bd38b5be003308f182`
BLAKE2b-256	`d27bb9d931418b01ddf1e596b2d6c678d328f62743239ae1f29782b1cb47561e`

See more details on using hashes here.

litechat 0.0.40

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LiteChat 🚀

Features ✨

Installation 🛠️

Available Models 🤖

Model Selection Helpers 🎯

Quick Start 🚀

Starting the Server

Using with OpenAI Client

Using with LiteLLM

Using LiteChat's Native Client

Advanced Features 🔧

Web Search Integration

Conversation Memory

Streaming Responses

API Reference 📚

LiteAI Client

Completion Function Parameters

Contributing 🤝

License 📄

Support 💬

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes