OpenAI Compatible API Server using OpenVINO GenAI
Project description
OpenVINO OpenAI API
An OpenAI-compatible API server powered by OpenVINO GenAI for efficient inference on Intel hardware.
Features
- OpenAI API compatibility for easy integration with existing applications
- Powered by OpenVINO for optimized inference on Intel CPUs and GPUs
- Support for both streaming and non-streaming responses
- Simple command-line interface for launching the server
Installation
pip install openvino-openai-api
Requirements
- Python 3.11 (due to dependency issues, only python 3.11 is supported)
- OpenVINO GenAI
- FastAPI
- Uvicorn
Usage
Starting the server
# Launch with default settings
openvino-openai-server --model-path /path/to/your/model
# Custom configuration
openvino-openai-server --model-path /path/to/your/model --device CPU --host 0.0.0.0 --port 8000
Sending requests
The API is compatible with OpenAI's chat completions endpoint:
import requests
import json
url = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}
data = {
"model": "local-model",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"max_tokens": 500
}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json())
Streaming responses
For streaming responses, set stream=True in your request and handle the server-sent events:
import requests
import json
url = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}
data = {
"model": "local-model",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"max_tokens": 500,
"stream": True
}
response = requests.post(url, headers=headers, data=json.dumps(data), stream=True)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: ') and not line.endswith('[DONE]'):
json_str = line[6:] # Remove 'data: ' prefix
try:
chunk = json.loads(json_str)
content = chunk['choices'][0]['delta'].get('content', '')
if content:
print(content, end='', flush=True)
except json.JSONDecodeError:
pass
print()
Model Requirements
The model directory should contain the following files:
openvino_model.binopenvino_tokenizer.binopenvino_detokenizer.bintokenizer_config.jsonwith a validchat_templatedefined
Development
Setup development environment
git clone https://github.com/yourusername/openvino-openai-api.git
cd openvino-openai-api
pip install -e ".[dev]"
Running tests
pytest
License
This project is licensed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openvino_openai_api-0.1.1.tar.gz.
File metadata
- Download URL: openvino_openai_api-0.1.1.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c1a751568567dc23a09a31e6b861797c18710163e0e300135899292c3900c69
|
|
| MD5 |
a56ccf24520f0c7854acefafe5bb5b37
|
|
| BLAKE2b-256 |
7194db0aebb1f53a66fb33eea3ab61e8bb51fb815f0af3e4df407df59b1f83a6
|
File details
Details for the file openvino_openai_api-0.1.1-py3-none-any.whl.
File metadata
- Download URL: openvino_openai_api-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04a9ac0e1f3b341911326f8e8c56c9023bf1bd43617432f4560cc40df7e16034
|
|
| MD5 |
269f7bbf89eb79542f5ca7703613648d
|
|
| BLAKE2b-256 |
2eeabbc2eb76ed335383638506133e38eead5752dd95801338265b8dcc44aaa1
|