Skip to main content

An easy-to-use, high-performance(?) backend for serving LLMs and other AI models, built on FastAPI.

Project description

FastMindAPI

PyPI - Version GitHub License GitHub code size in bytes PyPI - Downloads

An easy-to-use, high-performance(?) backend for serving LLMs and other AI models, built on FastAPI.

✨ 1 Features

1.1 Model: Support models with various backends

  • Transformers

    • Transformers_CausalLM ( AutoModelForCausalLM)
    • Peft_CausalLM ( PeftModelForCausalLM )
  • llama.cpp

    • Llamacpp_LLM (Llama)
  • OpenAI

    • OpenAI_ChatModel (/chat/completions)
  • vllm

    • vLLM_LLM(LLM)
  • MLC LLM

  • ...

1.2 Modules: More than just chatting with models

  • Function Calling (extra tools in Python)
  • Retrieval
  • Agent
  • ...

1.3 Flexibility: Easy to Use & Highly Customizable

  • Load the model when coding / runtime
  • Add any APIs you want

🚀 2 Quick Start

2.1 Installation

pip install fastmindapi

2.2 Usage (C/S)

2.2.1 Run the server (S)

in Terminal
fastmindapi-server --port 8000 --apikey sk-19992001
in Python
import fastmindapi as FM

# Run the server with authentication key, port 8000 for default
server = FM.Server(API_KEY="sk-19992001")
server.run()

2.2.2 Access the service (C)

via client
# For concise documention
curl http://IP:PORT/docs#/

# Use Case
# 1. add model info
curl http://127.0.0.1:8000/model/add_info \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-19992001" \
  -d '{
  "model_name": "gemma2",
  "model_type": "Transformers_CausalLM",
  "model_path": ".../PTM/gemma-2-2b"
}'

# 2. load model
curl http://127.0.0.1:8000/model/load/gemma2 -H "Authorization: Bearer sk-19992001"

# 3. run model inference
# 3.1 Generation API
curl http://127.0.0.1:8000/model/generate/gemma2 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-19992001" \
  -d '{
  "input_text": "Do you know something about Dota2?",
  "max_new_tokens": 100,
  "return_logits": true,
  "logits_top_k": 10,
  "stop_strings": ["\n"]
}'

# 3.2 OpenAI like API
curl http://127.0.0.1:8000/openai/chat/completions 
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer sk-19992001" \
	-d '{
  "model": "gemma2",
  "messages": [
    {
      "role": "system",
      "content": "You are a test assistant."
    },
    {
      "role": "user",
      "content": "Do you know something about Dota2?"
    }
  ],
  "max_completion_tokens": 100,
  "logprobs": true,
  "top_logprobs": 10,
  "stop": ["\n"]
}'
via HTTP requests
import fastmindapi as FM

# 127.0.0.1:8000 for default address
client = FM.Client(IP="x.x.x.x", PORT=xxx, API_KEY="sk-19992001") 

# 1. add model info
model_info_list = [
  {
    "model_name": "gemma2",
    "model_type": "Transformers_CausalLM",
    "model_path": ".../PTM/gemma-2-2b"
  },
]
client.add_model_info_list(model_info_list)

# 2. load model
client.load_model("gemma2")

# 3. run model inference
generation_request={
  "input_text": "Do you know something about Dota2?",
  "max_new_tokens": 10,
  "return_logits": True,
  "logits_top_k": 10,
  "stop_strings": ["."]
}
client.generate("gemma2", generation_request)

🪧 We primarily maintain the backend server; the client is provided for reference only. The main usage is through sending HTTP requests. (We might release FM-GUI in the future.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastmindapi-0.0.9.tar.gz (185.7 kB view details)

Uploaded Source

Built Distribution

fastmindapi-0.0.9-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file fastmindapi-0.0.9.tar.gz.

File metadata

  • Download URL: fastmindapi-0.0.9.tar.gz
  • Upload date:
  • Size: 185.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.10

File hashes

Hashes for fastmindapi-0.0.9.tar.gz
Algorithm Hash digest
SHA256 bb364f9b2b28b8eb90566ec496e480a08ed37628b4c7d8342dda2278a5b3b8fe
MD5 2a985050071e8eada43645b940bd9922
BLAKE2b-256 43ed4ecdb813134233dbaa8fbf9c425492e4584a430e99c03f7e7adb80749675

See more details on using hashes here.

File details

Details for the file fastmindapi-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: fastmindapi-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.10

File hashes

Hashes for fastmindapi-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ce24a68bae95ece99f582c6063aded50001047d737d313d622509ec2f8c1bc58
MD5 68124f170a3423e96fde120aefc620d4
BLAKE2b-256 d0b5142ff944d43afc21870c85277c48992ede9e29e4242ae943c5753e89c68a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page