Skip to main content

An OpenAI compatible server, focusing on OpenVINO and IPEX-LLM usage.

Project description

Azarrot

(Early WIP) An OpenAI compatible LLM inference server, focusing on OpenVINO™ and IPEX-LLM usage.

The name azarrot is combined from azalea and parrot.

Motivation

NVIDIA sucks on Linux, and AMD does not like people running ROCm on their consumer cards (sadly my RX 5500 XT is not supported). Meanwhile, Intel consumer cards are cheap, and have good fundamental software support, Intel is also actively maintaining and upstreaming many AI libraries.

So I bought an A770, but all the existing inference servers are lacking on Intel cards: some lacks quantization, some only support a few models, some does not run at all... and they are all lacking on OpenAI API features.

Finally, I decided to create my own inference server, focusing on Intel cards, and targeting full OpenAI API features. Let's see how far could I go.

Changelog

See CHANGELOG for more details.

Supported OpenAI features

  • ✅:Fully supported
  • ⭕:Partially supported
  • ❓:Implemented, but not tested, may work or not
  • 🚧:Working in progress
  • ❌:Not supported yet
Feature Subfeature IPEX-LLM OpenVINO Remarks
Chat Basic chat completion Text generation works, parameters (like frequency_penalty, temperature) not implemented yet
Chat Streaming response
Chat Image input InternVL2 supported
Chat Tool calling Qwen2 supported
Embeddings Create embeddings encoding_format not implemented yet
Models List models

Tested models

Model Repository Device Backend Remarks
CodeQwen1.5-7B https://huggingface.co/Qwen/CodeQwen1.5-7B Intel GPU IPEX-LLM, OpenVINO
InternVL2-8B https://huggingface.co/OpenGVLab/InternVL2-8B Intel GPU IPEX-LLM Image input supported
bge-m3 https://huggingface.co/BAAI/bge-m3 Intel GPU, CPU OpenVINO Accuracy may decrease if quantized to int8
Qwen2-7B-Instruct https://huggingface.co/Qwen/Qwen2-7B-Instruct Intel GPU IPEX-LLM Tool calling supported

Other untested models may work or not.

Prerequisites

Hardware

Azarrot supports CPUs and Intel GPUs. NVIDIA and AMD GPUs may work if you manually install corresponding torch libraries.

Tested GPUs:

  • Intel A770 16GB
  • Intel Xe 96EU (i7 12700H)

Software

Due to the xpu branch of intel-extension-for-pytorch still has no python 3.12 build, we have to use Python 3.11 or below.

You also have to install oneAPI Toolkit (at least 2024.0) and drivers.

Azarrot is tested on Ubuntu 22.04 and python 3.10.

Usage

WARNING: This project is still in early stages. Bugs are expected.

First, install azarrot from PyPI:

pip install azarrot

Then, create a server.yml in the directory you want to run it:

mkdir azarrot

# Copy from examples/server.yml
cp <SOURCE_ROOT>/examples/server.yml azarrot/

<SOURCE_ROOT> means the repository path you cloned.

In server.yml you can configure things like listening port, model path, etc.

Next we create the models directory:

cd azarrot
mkdir models

And copy an example model file into the models directory:

cp <SOURCE_ROOT>/examples/CodeQwen1.5-7B-ipex-llm.model.yml models/

Azarrot will load all .model.yml files in this directory. You need to manually download the model from huggingface, or convert them if you are using the OpenVINO backend:

huggingface-cli download --local-dir models/CodeQwen1.5-7B Qwen/CodeQwen1.5-7B

Azarrot will convert it to int4 when loading the model.

Now we can start the server:

source /opt/intel/oneapi/setvars.sh
python -m azarrot

And access http://localhost:8080/v1/models too see all loaded models.

More details are in the documents: Documents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azarrot-0.2.0.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

azarrot-0.2.0-py3-none-any.whl (38.8 kB view details)

Uploaded Python 3

File details

Details for the file azarrot-0.2.0.tar.gz.

File metadata

  • Download URL: azarrot-0.2.0.tar.gz
  • Upload date:
  • Size: 40.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for azarrot-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bbb0d0dfb38efe17624fde8fa6c4f293dee65bb98ba4bbbfa48de3dedd69d26d
MD5 1bdcd58836e140c02926d2f95adaa25c
BLAKE2b-256 6f01165bf174288390b0000080e33feda0b36a7a3db3727d7c04132a0d03d53b

See more details on using hashes here.

File details

Details for the file azarrot-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: azarrot-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 38.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for azarrot-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4da7653c6bb98471f439597d9ca6a6e8655f69b0b060ff0badc23c0487ca66d5
MD5 fd0b06bf659a4b4edd390ea351cb5c1a
BLAKE2b-256 8c46c72383c6a2bf4cc0c7affffe500fa06f9f36e821de0b6d3b7c6be81a1cee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page