Skip to main content

An easy-to-use, high-performance(?) backend for serving LLMs and other AI models, built on FastAPI.

Project description

FastMindAPI

PyPI - Version GitHub License GitHub code size in bytes PyPI - Downloads

An easy-to-use, high-performance(?) backend for serving LLMs and other AI models, built on FastAPI.

✨ 1 Features

1.1 Model: Support models with various backends

  • Transformers

    • Transformers_CausalLM ( AutoModelForCausalLM)
    • Peft_CausalLM ( PeftModelForCausalLM )
  • llama.cpp

    • Llamacpp_LLM (Llama)
  • OpenAI

    • OpenAI_ChatModel (/chat/completions)
  • vllm

    • vLLM_LLM(LLM)
  • MLC LLM

  • ...

1.2 Modules: More than just chatting with models

  • Function Calling (extra tools in Python)
  • Retrieval
  • Agent
  • ...

1.3 Flexibility: Easy to Use & Highly Customizable

  • Load the model when coding / runtime
  • Add any APIs you want

🚀 2 Quick Start

2.1 Installation

pip install fastmindapi

2.2 Usage (C/S)

2.2.1 Run the server (S)

in Terminal
fastmindapi-server --port 8000 --apikey sk-19992001
in Python
import fastmindapi as FM

# Run the server with authentication key, port 8000 for default
server = FM.Server(API_KEY="sk-19992001")
server.run()

2.2.2 Access the service (C)

via client
# For concise documention
curl http://IP:PORT/docs#/

# Use Case
# 1. add model info
curl http://127.0.0.1:8000/model/add_info \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-19992001" \
  -d '{
  "model_name": "gemma2",
  "model_type": "Transformers_CausalLM",
  "model_path": ".../PTM/gemma-2-2b"
}'

# 2. load model
curl http://127.0.0.1:8000/model/load/gemma2 -H "Authorization: Bearer sk-19992001"

# 3. run model inference
# 3.1 Generation API
curl http://127.0.0.1:8000/model/generate/gemma2 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-19992001" \
  -d '{
  "input_text": "Do you know something about Dota2?",
  "max_new_tokens": 100,
  "return_logits": true,
  "logits_top_k": 10,
  "stop_strings": ["\n"]
}'

# 3.2 OpenAI like API
curl http://127.0.0.1:8000/openai/chat/completions 
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer sk-19992001" \
	-d '{
  "model": "gemma2",
  "messages": [
    {
      "role": "system",
      "content": "You are a test assistant."
    },
    {
      "role": "user",
      "content": "Do you know something about Dota2?"
    }
  ],
  "max_completion_tokens": 100,
  "logprobs": true,
  "top_logprobs": 10,
  "stop": ["\n"]
}'
via HTTP requests
import fastmindapi as FM

# 127.0.0.1:8000 for default address
client = FM.Client(IP="x.x.x.x", PORT=xxx, API_KEY="sk-19992001") 

# 1. add model info
model_info_list = [
  {
    "model_name": "gemma2",
    "model_type": "Transformers_CausalLM",
    "model_path": ".../PTM/gemma-2-2b"
  },
]
client.add_model_info_list(model_info_list)

# 2. load model
client.load_model("gemma2")

# 3. run model inference
generation_request={
  "input_text": "Do you know something about Dota2?",
  "max_new_tokens": 10,
  "return_logits": True,
  "logits_top_k": 10,
  "stop_strings": ["."]
}
client.generate("gemma2", generation_request)

🪧 We primarily maintain the backend server; the client is provided for reference only. The main usage is through sending HTTP requests. (We might release FM-GUI in the future.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastmindapi-0.0.9.tar.gz (185.7 kB view hashes)

Uploaded Source

Built Distribution

fastmindapi-0.0.9-py3-none-any.whl (37.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page