Skip to main content

automated huggingchat openai style fastapi inference

Project description

LiteChat 🚀

LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.

Features ✨

  • 🔄 OpenAI API compatibility
  • 🌐 Web search integration
  • 💬 Conversation memory
  • 🔄 Streaming responses
  • 🛠️ Easy integration with HuggingFace models
  • 📦 Compatible with both litellm and OpenAI clients
  • 🎯 Type-safe model selection

Installation 🛠️

pip install litechat playwright
playwright install

Available Models 🤖

LiteChat supports the following models:

  • Qwen/Qwen2.5-Coder-32B-Instruct: Specialized coding model
  • Qwen/Qwen2.5-72B-Instruct: Large general-purpose model
  • meta-llama/Llama-3.3-70B-Instruct: Latest Llama 3 model
  • CohereForAI/c4ai-command-r-plus-08-2024: Cohere's command model
  • Qwen/QwQ-32B-Preview: Preview version of QwQ
  • nvidia/Llama-3.1-Nemotron-70B-Instruct-HF: NVIDIA's Nemotron model
  • meta-llama/Llama-3.2-11B-Vision-Instruct: Vision-capable Llama model
  • NousResearch/Hermes-3-Llama-3.1-8B: Lightweight Hermes model
  • mistralai/Mistral-Nemo-Instruct-2407: Mistral's instruction model
  • microsoft/Phi-3.5-mini-instruct: Microsoft's compact Phi model

Model Selection Helpers 🎯

LiteChat provides helper functions for type-safe model selection:

from litechat import litechat_model, litellm_model

# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")

# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct")  # Returns "openai/Qwen/Qwen2.5-72B-Instruct"

Quick Start 🚀

Starting the Server

You can start the LiteChat server in two ways:

  1. Using the CLI:
litechat_server
  1. Programmatically:
from litechat import litechat_server

if __name__ == "__main__":
    litechat_server(host="0.0.0.0", port=11437)

Using with OpenAI Client

import os
from openai import OpenAI

os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used

client = OpenAI()
response = client.chat.completions.create(
    model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
print(response.choices[0].message.content)

Using with LiteLLM

import os

from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model

os.environ["OPENAI_API_KEY"] = "key123"

response = completion(
    model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)

Using LiteChat's Native Client

from litechat import completion, genai, pp_completion
from litechat import litechat_model

# Basic completion
response = completion(
    prompt="What is quantum computing?",
    model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
    web_search=True  # Enable web search
)

# Stream with pretty printing
pp_completion(
    prompt="Explain the theory of relativity",
    model="Qwen/Qwen2.5-72B-Instruct",
    conversation_id="physics_chat"  # Enable conversation memory
)

# Get direct response
result = genai(
    prompt="Write a poem about spring",
    model="meta-llama/Llama-3.3-70B-Instruct",
    system_prompt="You are a creative poet"
)

Advanced Features 🔧

Web Search Integration

Enable web search to get up-to-date information:

response = completion(
    prompt="What are the latest developments in AI?",
    web_search=True
)

Conversation Memory

Maintain context across multiple interactions:

response = completion(
    prompt="Tell me more about that",
    conversation_id="unique_conversation_id"
)

Streaming Responses

Get token-by-token streaming:

for chunk in completion(
    prompt="Write a long story",
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)

API Reference 📚

LiteAI Client

from litechat import LiteAI, litechat_model

client = LiteAI(
    api_key="optional-key",  # Optional API key
    base_url="http://localhost:11437",  # Server URL
    system_prompt="You are a helpful assistant",  # Default system prompt
    web_search=False,  # Enable/disable web search by default
    model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF")  # Default model
)

Completion Function Parameters

  • messages: List of conversation messages or direct prompt string
  • model: HuggingFace model identifier (use litechat_model() for type safety)
  • system_prompt: System instruction for the model
  • temperature: Control randomness (0.0 to 1.0)
  • stream: Enable streaming responses
  • web_search: Enable web search
  • conversation_id: Enable conversation memory
  • max_tokens: Maximum tokens in response
  • tools: List of available tools/functions

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Support 💬

For support, please open an issue on the GitHub repository or reach out to the maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litechat-0.0.51.tar.gz (71.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litechat-0.0.51-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file litechat-0.0.51.tar.gz.

File metadata

  • Download URL: litechat-0.0.51.tar.gz
  • Upload date:
  • Size: 71.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-51-generic

File hashes

Hashes for litechat-0.0.51.tar.gz
Algorithm Hash digest
SHA256 8e2d7e1a5450ae821571aa5bf09a34a4c9dff304895b42142310dfa3b1a33714
MD5 c7905040369887db94ed0968a2f7e01b
BLAKE2b-256 1db4e9e9e27f191b47adc1dd42df04bc40fb89765620ae97268402d5d7c7bc23

See more details on using hashes here.

File details

Details for the file litechat-0.0.51-py3-none-any.whl.

File metadata

  • Download URL: litechat-0.0.51-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-51-generic

File hashes

Hashes for litechat-0.0.51-py3-none-any.whl
Algorithm Hash digest
SHA256 5debda8020e317ffbb7440a46e544f61824032d2e00faf5511a172041ff43e81
MD5 e34b73c24df8c528847a64e05e48f037
BLAKE2b-256 0be67343d0f48e60961dc18a8c95680f571e4bd21f73bd22d9b2981438b946db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page