Skip to main content

automated huggingchat openai style fastapi inference

Project description

LiteChat 🚀

LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.

Features ✨

  • 🔄 OpenAI API compatibility
  • 🌐 Web search integration
  • 💬 Conversation memory
  • 🔄 Streaming responses
  • 🛠️ Easy integration with HuggingFace models
  • 📦 Compatible with both litellm and OpenAI clients
  • 🎯 Type-safe model selection

Installation 🛠️

pip install litechat playwright
playwright install

Available Models 🤖

LiteChat supports the following models:

  • Qwen/Qwen2.5-Coder-32B-Instruct: Specialized coding model
  • Qwen/Qwen2.5-72B-Instruct: Large general-purpose model
  • meta-llama/Llama-3.3-70B-Instruct: Latest Llama 3 model
  • CohereForAI/c4ai-command-r-plus-08-2024: Cohere's command model
  • Qwen/QwQ-32B-Preview: Preview version of QwQ
  • nvidia/Llama-3.1-Nemotron-70B-Instruct-HF: NVIDIA's Nemotron model
  • meta-llama/Llama-3.2-11B-Vision-Instruct: Vision-capable Llama model
  • NousResearch/Hermes-3-Llama-3.1-8B: Lightweight Hermes model
  • mistralai/Mistral-Nemo-Instruct-2407: Mistral's instruction model
  • microsoft/Phi-3.5-mini-instruct: Microsoft's compact Phi model

Model Selection Helpers 🎯

LiteChat provides helper functions for type-safe model selection:

from litechat import litechat_model, litellm_model

# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")

# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct")  # Returns "openai/Qwen/Qwen2.5-72B-Instruct"

Quick Start 🚀

Starting the Server

You can start the LiteChat server in two ways:

  1. Using the CLI:
litechat_server
  1. Programmatically:
from litechat import litechat_server

if __name__ == "__main__":
    litechat_server(host="0.0.0.0", port=11437)

Using with OpenAI Client

import os
from openai import OpenAI

os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used

client = OpenAI()
response = client.chat.completions.create(
    model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
print(response.choices[0].message.content)

Using with LiteLLM

import os

from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model

os.environ["OPENAI_API_KEY"] = "key123"

response = completion(
    model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)

Using LiteChat's Native Client

from litechat import completion, genai, pp_completion
from litechat import litechat_model

# Basic completion
response = completion(
    prompt="What is quantum computing?",
    model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
    web_search=True  # Enable web search
)

# Stream with pretty printing
pp_completion(
    prompt="Explain the theory of relativity",
    model="Qwen/Qwen2.5-72B-Instruct",
    conversation_id="physics_chat"  # Enable conversation memory
)

# Get direct response
result = genai(
    prompt="Write a poem about spring",
    model="meta-llama/Llama-3.3-70B-Instruct",
    system_prompt="You are a creative poet"
)

Advanced Features 🔧

Web Search Integration

Enable web search to get up-to-date information:

response = completion(
    prompt="What are the latest developments in AI?",
    web_search=True
)

Conversation Memory

Maintain context across multiple interactions:

response = completion(
    prompt="Tell me more about that",
    conversation_id="unique_conversation_id"
)

Streaming Responses

Get token-by-token streaming:

for chunk in completion(
    prompt="Write a long story",
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)

API Reference 📚

LiteAI Client

from litechat import LiteAI, litechat_model

client = LiteAI(
    api_key="optional-key",  # Optional API key
    base_url="http://localhost:11437",  # Server URL
    system_prompt="You are a helpful assistant",  # Default system prompt
    web_search=False,  # Enable/disable web search by default
    model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF")  # Default model
)

Completion Function Parameters

  • messages: List of conversation messages or direct prompt string
  • model: HuggingFace model identifier (use litechat_model() for type safety)
  • system_prompt: System instruction for the model
  • temperature: Control randomness (0.0 to 1.0)
  • stream: Enable streaming responses
  • web_search: Enable web search
  • conversation_id: Enable conversation memory
  • max_tokens: Maximum tokens in response
  • tools: List of available tools/functions

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Support 💬

For support, please open an issue on the GitHub repository or reach out to the maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litechat-0.0.52-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file litechat-0.0.52-py3-none-any.whl.

File metadata

  • Download URL: litechat-0.0.52-py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-51-generic

File hashes

Hashes for litechat-0.0.52-py3-none-any.whl
Algorithm Hash digest
SHA256 8b2f72a393b73f2d40cd36188972017ec0acda4ed0b1f69e938e12246eadc8cf
MD5 dbedbe59ce326e73173f25dec50d5c35
BLAKE2b-256 6d9e23dbc221ac3156e7a8f547ea3653ee18c2d5fb75074b5854eecfab47856a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page