Skip to main content

automated huggingchat openai style fastapi inference

Project description

LiteChat 🚀

LiteChat is a lightweight, OpenAI-compatible interface for running local LLM inference servers. It provides seamless integration with various open-source models while maintaining OpenAI-style API compatibility.

Features ✨

  • 🔄 OpenAI API compatibility
  • 🌐 Web search integration
  • 💬 Conversation memory
  • 🔄 Streaming responses
  • 🛠️ Easy integration with HuggingFace models
  • 📦 Compatible with both litellm and OpenAI clients
  • 🎯 Type-safe model selection

Installation 🛠️

pip install litechat playwright
playwright install

Available Models 🤖

LiteChat supports the following models:

  • Qwen/Qwen2.5-Coder-32B-Instruct: Specialized coding model
  • Qwen/Qwen2.5-72B-Instruct: Large general-purpose model
  • meta-llama/Llama-3.3-70B-Instruct: Latest Llama 3 model
  • CohereForAI/c4ai-command-r-plus-08-2024: Cohere's command model
  • Qwen/QwQ-32B-Preview: Preview version of QwQ
  • nvidia/Llama-3.1-Nemotron-70B-Instruct-HF: NVIDIA's Nemotron model
  • meta-llama/Llama-3.2-11B-Vision-Instruct: Vision-capable Llama model
  • NousResearch/Hermes-3-Llama-3.1-8B: Lightweight Hermes model
  • mistralai/Mistral-Nemo-Instruct-2407: Mistral's instruction model
  • microsoft/Phi-3.5-mini-instruct: Microsoft's compact Phi model

Model Selection Helpers 🎯

LiteChat provides helper functions for type-safe model selection:

from litechat import litechat_model, litellm_model

# For use with LiteChat native client
model = litechat_model("Qwen/Qwen2.5-72B-Instruct")

# For use with LiteLLM
model = litellm_model("Qwen/Qwen2.5-72B-Instruct")  # Returns "openai/Qwen/Qwen2.5-72B-Instruct"

Quick Start 🚀

Starting the Server

You can start the LiteChat server in two ways:

  1. Using the CLI:
litechat_server
  1. Programmatically:
from litechat import litechat_server

if __name__ == "__main__":
    litechat_server(host="0.0.0.0", port=11437)

Using with OpenAI Client

import os
from openai import OpenAI

os.environ['OPENAI_BASE_URL'] = "http://localhost:11437/v1"
os.environ['OPENAI_API_KEY'] = "key123" # required, but not used

client = OpenAI()
response = client.chat.completions.create(
    model=litechat_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
print(response.choices[0].message.content)

Using with LiteLLM

import os

from litellm import completion
from litechat import OPENAI_COMPATIBLE_BASE_URL, litellm_model

os.environ["OPENAI_API_KEY"] = "key123"

response = completion(
    model=litellm_model("NousResearch/Hermes-3-Llama-3.1-8B"),
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    api_base=OPENAI_COMPATIBLE_BASE_URL
)
print(response)

Using LiteChat's Native Client

from litechat import completion, genai, pp_completion
from litechat import litechat_model

# Basic completion
response = completion(
    prompt="What is quantum computing?",
    model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
    web_search=True  # Enable web search
)

# Stream with pretty printing
pp_completion(
    prompt="Explain the theory of relativity",
    model="Qwen/Qwen2.5-72B-Instruct",
    conversation_id="physics_chat"  # Enable conversation memory
)

# Get direct response
result = genai(
    prompt="Write a poem about spring",
    model="meta-llama/Llama-3.3-70B-Instruct",
    system_prompt="You are a creative poet"
)

Advanced Features 🔧

Web Search Integration

Enable web search to get up-to-date information:

response = completion(
    prompt="What are the latest developments in AI?",
    web_search=True
)

Conversation Memory

Maintain context across multiple interactions:

response = completion(
    prompt="Tell me more about that",
    conversation_id="unique_conversation_id"
)

Streaming Responses

Get token-by-token streaming:

for chunk in completion(
    prompt="Write a long story",
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)

API Reference 📚

LiteAI Client

from litechat import LiteAI, litechat_model

client = LiteAI(
    api_key="optional-key",  # Optional API key
    base_url="http://localhost:11437",  # Server URL
    system_prompt="You are a helpful assistant",  # Default system prompt
    web_search=False,  # Enable/disable web search by default
    model=litechat_model("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF")  # Default model
)

Completion Function Parameters

  • messages: List of conversation messages or direct prompt string
  • model: HuggingFace model identifier (use litechat_model() for type safety)
  • system_prompt: System instruction for the model
  • temperature: Control randomness (0.0 to 1.0)
  • stream: Enable streaming responses
  • web_search: Enable web search
  • conversation_id: Enable conversation memory
  • max_tokens: Maximum tokens in response
  • tools: List of available tools/functions

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Support 💬

For support, please open an issue on the GitHub repository or reach out to the maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litechat-0.0.58.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litechat-0.0.58-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file litechat-0.0.58.tar.gz.

File metadata

  • Download URL: litechat-0.0.58.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-51-generic

File hashes

Hashes for litechat-0.0.58.tar.gz
Algorithm Hash digest
SHA256 23f349296d969979c37e6b7b255a70365493f3523b22b6c3e25b8b966a4a4cb0
MD5 201ed196323fdff47de5e2d8df48394d
BLAKE2b-256 badd0b805a8711c936d550f53b8a6f9062c87ce979f3b405e343b09a98503bbf

See more details on using hashes here.

File details

Details for the file litechat-0.0.58-py3-none-any.whl.

File metadata

  • Download URL: litechat-0.0.58-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.0rc1 Linux/6.8.0-51-generic

File hashes

Hashes for litechat-0.0.58-py3-none-any.whl
Algorithm Hash digest
SHA256 753b64fb434499683fd2a08d76c32e302268b50cb0b91ea4a5c3c5d5a35719fb
MD5 6e226d4dd6f8cdbf746091c712f93b5e
BLAKE2b-256 6baf673ff5a68da803e158b6e5b30a5521a5363ae5311d6501823ad1f83a0ce2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page