The potassium package is a flask-like HTTP server for serving large AI models

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.7
- Python :: 3.8
Topic
- Software Development :: Build Tools

Project description

Potassium

Potassium (1)

Potassium is an open source web framework, built to tackle the unique challenges of serving custom models in production.

The goal of this project is to:

Provide a familiar web framework similar to Flask/FastAPI
Bake in best practices for handling large, GPU-bound ML models
Provide a set of primitives common in ML serving, such as:
- POST request handlers
- Websocket / streaming connections
- Async handlers w/ webhooks
Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on Banana Serverless 😉)

Potassium optionally works in tandem with other tools:

Banana CLI: (open-source) an npm-like CLI for downloading boilerplate, running tests, managing packages, and running hot-reload dev servers to tighten the development loop to milliseconds
Banana SDKs: (open-source) clients to call your Potassium backend
Banana Serverless: (closed-source) purpose-built hosting for Potassium apps
- Build system: compiles models to be as fast / inexpensive as possible
- Serverless infra: infrastructure that scales from zero with minimal cold-boots

Stability Note:

This is a v0 release, meaning it is not stable and the interface may change in future versions without notice.
This release is currently runnable on Banana Serverless (as is any custom code), but coldboot optimizations are not yet supported for Potassium apps.

Quickstart: Serving a Huggingface BERT model

Install the potassium package

pip3 install potassium

Create a python file called app.py containing:

from potassium import Potassium

from transformers import pipeline
import torch

app = Potassium("server")

@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)

    app.optimize(model)

    return app.set_cache({
        "model": model
    })

@app.handler
def handler(cache: dict, json_in: dict) -> dict:
    prompt = json_in.get('prompt', None)
    model = cache.get("model")

    outputs = model(prompt)
    return {"outputs": outputs}

if __name__ == "__main__":
    app.serve()

This runs a Huggingface BERT model.

For this example, you'll also need to install transformers and torch.

pip3 install transformers torch

Start the server with:

python3 app.py

Test the running server with:

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000

Documentation

potassium.Potassium

from potassium import Potassium

app = Potassium("server")

This instantiates your HTTP app, similar to popular frameworks like Flask

This HTTP server is production-ready out of the box, with a built-in queue to safely handle concurrent requests.

@app.init

@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)

    app.optimize(model)

    return app.set_cache({
        "model": model
    })

The @app.init decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:

Your AI model, loaded to GPU
Tokenizers
Precalculated embeddings

Once initialized, you must save those variables to the cache with app.set_cache({}) so they can be referenced later.

There may only be one @app.init function.

@app.handler

@app.handler
def handler(cache: dict, json_in: dict) -> dict:
    prompt = json_in.get('prompt', None)
    model = cache.get("model")

    outputs = model(prompt)
    return {"outputs": outputs}

The @app.handler decorated function runs for every http call, and is used to run inference or training workloads against your model(s).

Args	Type	Description
cache	dict	The app's cache, set with set_cache()
json_in	dict	The json body of the input call. If using the Banana client SDK, this is the same as model_inputs

Return	Type	Description
json_out	dict	The json body to return to the client. If using the Banana client SDK, this is the same as model_outputs

There may only be one @app.handler function.

app.serve()

app.serve runs the server, and is a blocking operation.

app.set_cache()

app.set_cache({})

app.set_cache saves the input dictionary to the app's cache, for reuse in future calls. It may be used in both the @app.init and @app.handler functions.

app.set_cache overwrites any preexisting cache.

app.get_cache()

cache = app.get_cache()

app.get_cache fetches the dictionary to the app's cache. This value is automatically provided for you as the cache argument in the @app.handler function.

app.optimize(model)

model # some pytorch model
app.optimize(model)

app.optimize is a feature specific to users hosting on Banana's serverless GPU infrastructure. It is run during buildtime rather than runtime, and is used to locate the model(s) to be targeted for Banana's Fastboot optimization.

Multiple models may be optimized. Only Pytorch models are currently supported.

@app.result_webhook(url)

@app.handler
@app.result_webhook(url="http://localhost:8001/")
def handler(cache: dict, json_in: dict) -> dict:
    # ...
    return {"outputs": outputs}

app.result_webhook is an optional decorator for the handler function. If added, it posts the handler return json onward to the given webhook url.

Project details

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.7
- Python :: 3.8
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

0.5.0

Dec 12, 2023

0.4.1

Nov 28, 2023

0.4.0

Nov 7, 2023

0.3.2

Oct 25, 2023

0.3.1

Oct 23, 2023

0.3.0

Oct 19, 2023

0.2.1

Oct 18, 2023

0.2.0

Oct 12, 2023

0.1.2

Jul 24, 2023

0.1.1

Jun 29, 2023

0.1.0

Jun 23, 2023

0.0.10

Jun 21, 2023

0.0.9

May 25, 2023

0.0.8

Mar 26, 2023

0.0.7

Mar 24, 2023

This version

0.0.6

Mar 24, 2023

0.0.5

Mar 23, 2023

0.0.4

Mar 15, 2023

0.0.3

Mar 2, 2023

0.0.2

Mar 1, 2023

0.0.1

Mar 1, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

potassium-0.0.6.tar.gz (9.2 kB view hashes)

Uploaded Mar 24, 2023 Source

Built Distribution

potassium-0.0.6-py3-none-any.whl (9.5 kB view hashes)

Uploaded Mar 24, 2023 Python 3

Hashes for potassium-0.0.6.tar.gz

Hashes for potassium-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`ed57663bc9199aac519031c5eb9afe2ce3211dc0bae24b124a89249e3c5701bb`
MD5	`27082b384abc49e3c01b3f1047741f7a`
BLAKE2b-256	`1e0b05e6f6255fb3252817f8b90e6ca749fdc9b160331355e0d60a5c179196e3`

Hashes for potassium-0.0.6-py3-none-any.whl

Hashes for potassium-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cb7ccafe4274ead49a7c773a6bbe64b43f56c96eb83b3438d95675c42c372da6`
MD5	`f159fa2f89e17a3035fa609ac190d333`
BLAKE2b-256	`346949d895a4184c8efb9bc81f59bcfd321c3163127fdc29982a8ae379893864`