Skip to main content

No project description provided

Project description

SimpleAI

A self-hosted alternative to the not-so-open AI API. It is focused on replicating the main endpoints for LLM:

  • Text completion (/completions/)
    • ✔️ Non stream responses
    • ✔️ stream responses
  • Chat (/chat/completions/) [ example ]
    • ✔️ Non stream responses
    • ✔️ stream responses
  • Edits (/edits/) [ example ]
  • Embeddings (/embeddings/) [ example ]
  • Not supported (yet): images, audio, files, fine-tunes, moderations

It allows you to experiment with competing approaches quickly and easily.

Overview

Why this project?

Well first of all it's a fun little project, and perhaps a better use of my time than watching some random dog videos on Reddit or YouTube. I also believe it can be a great way to:

  • experiment with new models and not be too dependent on a specific API provider,
  • create benchmarks to decide which approach works best for you,
  • handle some specific use cases where you cannot fully rely on an external service, without the need of re-writing everything

If you find interesting use cases, feel free to share your experience.

Installation

On a machine with Python 3.9+:

  • [Latest] From source:
pip install git+https://github.com/lhenault/simpleAI 
  • From Pypi:
pip install simple_ai_server

Setup

Start by creating a configuration file to declare your models:

simple_ai init

It should create models.toml, where you declare your different models (see how below). Then start the server with:

simple_ai serve [--host 127.0.0.1] [--port 8080]

You can then see the docs and try it there.

Integrating and declaring a model

Model integration

Models are queried through gRPC, in order to separate the API itself from the model inference, and to support several languages beyond Python through this protocol.

To expose for instance an embedding model in Python, you simply have to import a few things, and implements the .embed() method of your EmbeddingModel class:

import logging
from dataclasses import dataclass

from simple_ai.api.grpc.embedding.server import serve, LanguageModelServicer

@dataclass(unsafe_hash=True)
class EmbeddingModel:
    def embed(self, 
        inputs: list=[],
    ) -> list:
        # TODO : implements the embed method
        return [[]]

if __name__ == '__main__':   
    model_servicer = LanguageModelServicer(model=EmbeddingModel())
    serve(address='[::]:50051', model_servicer=model_servicer)

For a completion task, follow the same logic, but import from simple_ai.api.grpc.completion.server instead, and implements a complete method.

Declaring a model

To add a model, you first need to deploy a gRPC service (using the provided .proto file and / or the tools provided in src/api/). Once your model is live, you only have to add it to the models.toml configuration file. For instance, let's say you've locally deployed a llama.cpp model available on port 50051, just add:

[llama-7B-4b]
    [llama-7B-4b.metadata]
        owned_by    = 'Meta / ggerganov'
        permission  = []
        description = 'C++ implementation of LlaMA model, 7B parameters, 4-bit quantization'
    [llama-7B-4b.network]
        url = 'localhost:50051'
        type = 'gRPC'

You can see see and try of the provided examples in examples/ directory (might require GPU).

Usage

Thanks to the Swagger UI, you can see and try the different endpoints here:

Example query with cUrl

Or you can directly use the API with the tool of your choice.

curl -X 'POST' \
  'http://127.0.0.1:8080/edits/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "alpaca-lora-7B",
  "instruction": "Make this message nicer and more formal",
  "input": "This meeting was useless and should have been a bloody email",
  "top_p": 1,
  "n": 1,
  "temperature": 1,
  "max_tokens": 256
}'

It's also compatible with OpenAI python client:

import openai

# Put anything you want in `API key`
openai.api_key = 'Free the models'

# Point to your own url
openai.api_base = "http://127.0.0.1:8080"

# Do your usual things, for instance a completion query:
print(openai.Model.list())
completion = openai.Completion.create(model="llama-7B", prompt="Hello everyone this is")

Common issues and solutions

Adding a CORS middleware

If you encounter CORS issues, it is suggested to not use the simple_ai serve command, but to rather use your own script to add your CORS configuration, using the FastAPI CORS middleware.

For instance you can create my_server.py with:

from simple_ai.server import app
from fastapi.middleware.cors import CORSMiddleware

def add_cors(app):
    origins = [
        "http://localhost",
        "http://localhost:8080"
    ]
    app.add_middleware(
        CORSMiddleware,
        allow_origins=origins,
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )
    return app

def serve_app(host="127.0.0.1", port=8080, **kwargs):
    app = add_cors(app)
    uvicorn.run(app=app, host=host, port=port)
    
if __name__ == "__main__":
    serve_app(host="127.0.0.1", port=8080)
    

And run it as python3 my_server.py instead.

I needd /v1 prefix in the endpoints

Some projects have decided to include the /v1 prefix as part of the endpoints, while OpenAI client includes it in its api_base parameter. If you need to have it as part of the endpoints for your project, you can use a custom script instead of simple_ai serve:

import uvicorn
from simple_ai.server import app as v1_app
from fastapi import APIRouter, FastAPI

sai_app = FastAPI()
sai_app.mount("/v1", v1_app)

def serve_app(app=sai_app, host="0.0.0.0", port=8080):
    uvicorn.run(app=app, host=host, port=port)
    
if __name__ == "__main__":
    serve_app()
    

Contribute

This is very much work in progress and far from being perfect, so let me know if you want to help. PR, issues, documentation, cool logo, all the usual candidates are welcome.

Development Environment

In order for the following steps to work it is required to have make and poetry installed on your system.

To install the development environment run:

make install-dev 

This will install all dev dependencies as well as configure your pre-commit helpers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_ai_server-0.2.1.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

simple_ai_server-0.2.1-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file simple_ai_server-0.2.1.tar.gz.

File metadata

  • Download URL: simple_ai_server-0.2.1.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.11.2 Windows/10

File hashes

Hashes for simple_ai_server-0.2.1.tar.gz
Algorithm Hash digest
SHA256 240906fa9eb4f96dd7f3fc30932fb5314c2c58a8857def6f066036ead243c2f5
MD5 c2c408fc6d1545e165dd148e8ab10e97
BLAKE2b-256 d468bbc4be653e4c54dcd522d24d3f58fd913ef3e71a816c7132cdd79c4f50c4

See more details on using hashes here.

File details

Details for the file simple_ai_server-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_ai_server-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f2dfee8c5031aa207779a32e31a14bb3b3c3301fe3ad93ef6b3d6b93a1a5e579
MD5 35277b91a3a105a25cfae71584e6685f
BLAKE2b-256 270c91b02ae3ac85efa87116203f9d5f46ccbdbb42e1194f260baca9705f1a91

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page