Skip to main content

OpenAI Compatible API Server using OpenVINO GenAI

Project description

OpenVINO OpenAI API

An OpenAI-compatible API server powered by OpenVINO GenAI for efficient inference on Intel hardware.

Features

  • OpenAI API compatibility for easy integration with existing applications
  • Powered by OpenVINO for optimized inference on Intel CPUs and GPUs
  • Support for both streaming and non-streaming responses
  • Simple command-line interface for launching the server

Installation

pip install openvino-openai-api

Requirements

  • Python 3.11 (due to dependency issues, only python 3.11 is supported)
  • OpenVINO GenAI
  • FastAPI
  • Uvicorn

Usage

Starting the server

# Launch with default settings
openvino-openai-server --model-path /path/to/your/model

# Custom configuration
openvino-openai-server --model-path /path/to/your/model --device CPU --host 0.0.0.0 --port 8000

Sending requests

The API is compatible with OpenAI's chat completions endpoint:

import requests
import json

url = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}
data = {
    "model": "local-model",
    "messages": [
        {"role": "user", "content": "Hello, how are you?"}
    ],
    "max_tokens": 500
}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json())

Streaming responses

For streaming responses, set stream=True in your request and handle the server-sent events:

import requests
import json

url = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}
data = {
    "model": "local-model",
    "messages": [
        {"role": "user", "content": "Tell me a story"}
    ],
    "max_tokens": 500,
    "stream": True
}

response = requests.post(url, headers=headers, data=json.dumps(data), stream=True)
for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: ') and not line.endswith('[DONE]'):
            json_str = line[6:]  # Remove 'data: ' prefix
            try:
                chunk = json.loads(json_str)
                content = chunk['choices'][0]['delta'].get('content', '')
                if content:
                    print(content, end='', flush=True)
            except json.JSONDecodeError:
                pass
print()

Model Requirements

The model directory should contain the following files:

  • openvino_model.bin
  • openvino_tokenizer.bin
  • openvino_detokenizer.bin
  • tokenizer_config.json with a valid chat_template defined

Development

Setup development environment

git clone https://github.com/yourusername/openvino-openai-api.git
cd openvino-openai-api
pip install -e ".[dev]"

Running tests

pytest

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openvino_openai_api-0.1.1.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openvino_openai_api-0.1.1-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file openvino_openai_api-0.1.1.tar.gz.

File metadata

  • Download URL: openvino_openai_api-0.1.1.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for openvino_openai_api-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7c1a751568567dc23a09a31e6b861797c18710163e0e300135899292c3900c69
MD5 a56ccf24520f0c7854acefafe5bb5b37
BLAKE2b-256 7194db0aebb1f53a66fb33eea3ab61e8bb51fb815f0af3e4df407df59b1f83a6

See more details on using hashes here.

File details

Details for the file openvino_openai_api-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for openvino_openai_api-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 04a9ac0e1f3b341911326f8e8c56c9023bf1bd43617432f4560cc40df7e16034
MD5 269f7bbf89eb79542f5ca7703613648d
BLAKE2b-256 2eeabbc2eb76ed335383638506133e38eead5752dd95801338265b8dcc44aaa1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page