Skip to main content

automated huggingchat openai style fastapi inference

Project description

LiteChat 🚀

LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.

Features ✨

  • 🔄 OpenAI API compatibility
  • 🌐 Web search integration
  • 💬 Conversation memory
  • 🔄 Streaming responses
  • 🛠️ Easy integration with HuggingFace models
  • 📦 Compatible with both litellm and OpenAI clients
  • 🎯 Type-safe model selection

Installation 🛠️

pip install litechat playwright
playwright install

Available Models 🤖

LiteChat supports the following models:

  • Qwen/Qwen2.5-Coder-32B-Instruct: Specialized coding model
  • Qwen/Qwen2.5-72B-Instruct: Large general-purpose model
  • meta-llama/Llama-3.3-70B-Instruct: Latest Llama 3 model
  • CohereForAI/c4ai-command-r-plus-08-2024: Cohere's command model
  • Qwen/QwQ-32B-Preview: Preview version of QwQ
  • nvidia/Llama-3.1-Nemotron-70B-Instruct-HF: NVIDIA's Nemotron model
  • meta-llama/Llama-3.2-11B-Vision-Instruct: Vision-capable Llama model
  • NousResearch/Hermes-3-Llama-3.1-8B: Lightweight Hermes model
  • mistralai/Mistral-Nemo-Instruct-2407: Mistral's instruction model
  • microsoft/Phi-3.5-mini-instruct: Microsoft's compact Phi model

Model Selection Helpers 🎯

LiteChat provides helper functions for type-safe model selection:

from litechat import litechat_model, litellm_model

# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")

# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct")  # Returns "openai/Qwen/Qwen2.5-72B-Instruct"

Quick Start 🚀

Starting the Server

You can start the LiteChat server in two ways:

  1. Using the CLI:
litechat_server
  1. Programmatically:
from litechat import litechat_server

if __name__ == "__main__":
    litechat_server(host="0.0.0.0", port=11437)

Using with OpenAI Client

import os
from openai import OpenAI

os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used

client = OpenAI()
response = client.chat.completions.create(
    model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
print(response.choices[0].message.content)

Using with LiteLLM

import os

from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model

os.environ["OPENAI_API_KEY"] = "key123"

response = completion(
    model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)

Using LiteChat's Native Client

from litechat import completion, genai, pp_completion
from litechat import litechat_model

# Basic completion
response = completion(
    prompt="What is quantum computing?",
    model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
    web_search=True  # Enable web search
)

# Stream with pretty printing
pp_completion(
    prompt="Explain the theory of relativity",
    model="Qwen/Qwen2.5-72B-Instruct",
    conversation_id="physics_chat"  # Enable conversation memory
)

# Get direct response
result = genai(
    prompt="Write a poem about spring",
    model="meta-llama/Llama-3.3-70B-Instruct",
    system_prompt="You are a creative poet"
)

Advanced Features 🔧

Web Search Integration

Enable web search to get up-to-date information:

response = completion(
    prompt="What are the latest developments in AI?",
    web_search=True
)

Conversation Memory

Maintain context across multiple interactions:

response = completion(
    prompt="Tell me more about that",
    conversation_id="unique_conversation_id"
)

Streaming Responses

Get token-by-token streaming:

for chunk in completion(
    prompt="Write a long story",
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)

API Reference 📚

LiteAI Client

from litechat import LiteAI, litechat_model

client = LiteAI(
    api_key="optional-key",  # Optional API key
    base_url="http://localhost:11437",  # Server URL
    system_prompt="You are a helpful assistant",  # Default system prompt
    web_search=False,  # Enable/disable web search by default
    model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF")  # Default model
)

Completion Function Parameters

  • messages: List of conversation messages or direct prompt string
  • model: HuggingFace model identifier (use litechat_model() for type safety)
  • system_prompt: System instruction for the model
  • temperature: Control randomness (0.0 to 1.0)
  • stream: Enable streaming responses
  • web_search: Enable web search
  • conversation_id: Enable conversation memory
  • max_tokens: Maximum tokens in response
  • tools: List of available tools/functions

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Support 💬

For support, please open an issue on the GitHub repository or reach out to the maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litechat-0.0.39.tar.gz (67.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litechat-0.0.39-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file litechat-0.0.39.tar.gz.

File metadata

  • Download URL: litechat-0.0.39.tar.gz
  • Upload date:
  • Size: 67.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-49-generic

File hashes

Hashes for litechat-0.0.39.tar.gz
Algorithm Hash digest
SHA256 80887dcd8554d737e524cb932fd92788ac01c5b29ef154b20e3358f2ffef29b7
MD5 70a2a382b34a623e49a1c81e41e3e6b6
BLAKE2b-256 d24061186af4200d3928e3a79e052a88d8d155745653ca45cd3836abaef8814a

See more details on using hashes here.

File details

Details for the file litechat-0.0.39-py3-none-any.whl.

File metadata

  • Download URL: litechat-0.0.39-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-49-generic

File hashes

Hashes for litechat-0.0.39-py3-none-any.whl
Algorithm Hash digest
SHA256 86e636e1bbc42e9af5e3b8835625d619d1214506e9b9110f8472c0492e85cf9e
MD5 f1c7498872543a7987f7bfd5de8fdc5c
BLAKE2b-256 36b3c9537b56b216fb16403ef8dae1e27a55439762004fa8dc0e702d3e73468d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page