No project description provided
Project description
llama-api-server
This project is under active deployment. Breaking changes could be made any time.
Llama as a Service! This project try to build a REST-ful API server compatible to OpenAI API using open source backends like llama.
Get started
Prepare model
llama.cpp
If you you don't have quantize llama, you need to follow instruction to prepare model.
Install
pip install llama-api-server
echo > config.yml << EOF
models:
completions:
text-davinci-003:
type: llama_cpp
params:
path: /absolute/path/to/your/7B/ggml-model-q4_0.bin
embeddings:
text-embedding-ada-002:
type: llama_cpp
params:
path: /absolute/path/to/your/7B/ggml-model-q4_0.bin
EOF
# start web server
python -m llama_api_server
Call with openai-python
export OPENAI_API_BASE=http://127.0.0.1:5000/v1
openai api completions.create -e text-davinci-003 -p "hello?"
Roadmap
Tested with
- openai-python
- OPENAI_API_TYPE=default
- OPENAI_API_TYPE=azure
Supported APIs
- Completions
- set
temperature
,top_p
, andtop_k
- set
max_tokens
- set
stop
- set
stream
- set
n
- set
presence_penalty
andfrequency_penalty
- set
logit_bias
- set
- Embeddings
- batch process
- Chat
Supported backed
Others
- Performance parameters like
n_batch
andn_thread
- Documents
- Token auth
- Intergration tests
- A tool to download/prepare pretrain model
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llama_api_server-0.1.3.tar.gz
(7.5 kB
view hashes)
Built Distribution
Close
Hashes for llama_api_server-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 531b3ba5aa9f71abd1f265c853768efde774cfb5ed702f2b29bb6d95a5b6c7e4 |
|
MD5 | c1930167ff55b6d164e5953a1b10d031 |
|
BLAKE2b-256 | 52f7642aa2c0c75be47febcb179e2e7a93cbc563e489cf6ac794d218fead11d0 |