An easy-to-use, high-performance(?) backend for serving LLMs and other AI models, built on FastAPI.
Project description
FastMindAPI
An easy-to-use, high-performance(?) backend for serving LLMs and other AI models, built on FastAPI.
✨ 1 Features
1.1 Model: Support models with various backends
-
Transformers_CausalLM
(AutoModelForCausalLM
)Peft_CausalLM
(PeftModelForCausalLM
)
-
Llamacpp_LLM
(Llama
)
-
✅ OpenAI
OpenAI_ChatModel
(/chat/completions
)
-
✅ vllm
vLLM_LLM
(LLM
)
-
...
1.2 Modules: More than just chatting with models
- Function Calling (extra tools in Python)
- Retrieval
- Agent
- ...
1.3 Flexibility: Easy to Use & Highly Customizable
- Load the model when coding / runtime
- Add any APIs you want
🚀 2 Quick Start
2.1 Installation
pip install fastmindapi
2.2 Usage (C/S)
2.2.1 Run the server (S)
in Terminal
fastmindapi-server --port 8000 --apikey sk-19992001
in Python
import fastmindapi as FM
# Run the server with authentication key, port 8000 for default
server = FM.Server(API_KEY="sk-19992001")
server.run()
2.2.2 Access the service (C)
via client
# For concise documention
curl http://IP:PORT/docs#/
# Use Case
# 1. add model info
curl http://127.0.0.1:8000/model/add_info \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-19992001" \
-d '{
"model_name": "gemma2",
"model_type": "Transformers_CausalLM",
"model_path": ".../PTM/gemma-2-2b"
}'
# 2. load model
curl http://127.0.0.1:8000/model/load/gemma2 -H "Authorization: Bearer sk-19992001"
# 3. run model inference
# 3.1 Generation API
curl http://127.0.0.1:8000/model/generate/gemma2 \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-19992001" \
-d '{
"input_text": "Do you know something about Dota2?",
"max_new_tokens": 100,
"return_logits": true,
"logits_top_k": 10,
"stop_strings": ["\n"]
}'
# 3.2 OpenAI like API
curl http://127.0.0.1:8000/openai/chat/completions
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-19992001" \
-d '{
"model": "gemma2",
"messages": [
{
"role": "system",
"content": "You are a test assistant."
},
{
"role": "user",
"content": "Do you know something about Dota2?"
}
],
"max_completion_tokens": 100,
"logprobs": true,
"top_logprobs": 10,
"stop": ["\n"]
}'
via HTTP requests
import fastmindapi as FM
# 127.0.0.1:8000 for default address
client = FM.Client(IP="x.x.x.x", PORT=xxx, API_KEY="sk-19992001")
# 1. add model info
model_info_list = [
{
"model_name": "gemma2",
"model_type": "Transformers_CausalLM",
"model_path": ".../PTM/gemma-2-2b"
},
]
client.add_model_info_list(model_info_list)
# 2. load model
client.load_model("gemma2")
# 3. run model inference
generation_request={
"input_text": "Do you know something about Dota2?",
"max_new_tokens": 10,
"return_logits": True,
"logits_top_k": 10,
"stop_strings": ["."]
}
client.generate("gemma2", generation_request)
🪧 We primarily maintain the backend server; the client is provided for reference only. The main usage is through sending HTTP requests. (We might release FM-GUI in the future.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fastmindapi-0.0.9.tar.gz
(185.7 kB
view hashes)
Built Distribution
Close
Hashes for fastmindapi-0.0.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce24a68bae95ece99f582c6063aded50001047d737d313d622509ec2f8c1bc58 |
|
MD5 | 68124f170a3423e96fde120aefc620d4 |
|
BLAKE2b-256 | d0b5142ff944d43afc21870c85277c48992ede9e29e4242ae943c5753e89c68a |