Extensions for mlflow to make the devloop better for custom models.

These details have not been verified by PyPI

Project links

Homepage

Project description

mlflow extensions

Overview

The goal of this project is to extend the capabilities of MLflow to support additional features such as testing pyfunc in databricks notebooks, and deploy complex llm server infrastructure such as vllm, sglang, ollama, etc.

Features

Testing pyfunc models using mlflow_extensions.serving.fixtures.LocalTestServer in Databricks notebooks.
Deploying vision models, etc using mlflow_extensions.serving.engines.vllm_engine in Databricks model serving.

Installation

pip install mlflow-extensions

Supported Server Frameworks

vLLM
SGlang (tbd)
Ollama (tbd)

Usage

Testing Pyfunc Models

The local test server will spawn a local server that will serve the model and can be queried using the query method. It will spawn the server in its own process group id and if you need to control the port, test_serving_port can be passed.

from mlflow_extensions.serving.fixures import LocalTestServer
from mlflow.utils.databricks_utils import get_databricks_host_creds


fixture = LocalTestServer(
  model_uri="<uri to the model or run>",
  registry_host=get_databricks_host_creds().host,
  registry_token=get_databricks_host_creds().token
)

fixture.start()

fixture.wait_and_assert_healthy()

# assert fixture.query(payload={"inputs": [....]}) == ...

fixture.stop()

Deploying Models using vLLM

vLLM is a optimized server that is optimized for running llms and multimodal lms. It is a complex server that supports a lot of configuration/knobs to improve performance. This documentation will be updated as we test more configurations.

Registering a model

import mlflow

from mlflow_extensions.serving.engines import VLLMEngineProcess, VLLMEngineConfig
from mlflow_extensions.serving.wrapper import CustomServingEnginePyfuncWrapper

mlflow.set_registry_uri("databricks-uc")

# optionally if you need to download model from hf which is not public facing
# os.environ["HF_TOKEN"] = ...

model = CustomServingEnginePyfuncWrapper(
    engine=VLLMEngineProcess,
    engine_config=VLLMEngineConfig(
        model="microsoft/Phi-3.5-vision-instruct",
        trust_remote_code=True,
        max_model_len=64000,  # max token length for context
        guided_decoding_backend="outlines"
    )
)

model.setup()  # download artifacts from huggingface

with mlflow.start_run() as run:
        mlflow.pyfunc.log_model(
            "model",
            python_model=model,
            artifacts=model.artifacts,
            pip_requirements=model.get_pip_reqs(),
            registered_model_name="<catalog>.<schema>.<model-name>"
        )

Calling a model using openai sdk

Mlflow extensions offers a wrapper on top of openai sdk to intercept requests and conform them to model serving infra.

from mlflow_extensions.serving.adapters import OpenAIWrapper as OpenAI

client = OpenAI(base_url="https://<>.com/serving-endpoints/<model-name>", api_key="<dapi...>")
response = client.chat.completions.create(
  model="microsoft/Phi-3.5-vision-instruct",
  messages=[
    {"role": "user", "content": [
                {"type": "text", "text": "Is the image indoors or outdoors?"},
                {
                    "type": "image_url",
                    "image_url": {
                      "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    },
                },
            ],
     }
  ],
  #   if you want to use guided decoding to improve performance and control output
  # extra_body={
  #   "guided_choice": ["outside", "indoors"]
  # }
)

Calling a model using langchain ChatOpenAI sdk

from mlflow_extensions.serving.adapters import ChatOpenAIWrapper as ChatOpenAI

model = ChatOpenAI(base_url="https://<>.com/serving-endpoints/<model-name>", api_key="<dapi...>")
model.invoke("hello world")

Supported models

vLLM engine

Here are the list of supported models for vllm engine: https://docs.vllm.ai/en/latest/models/supported_models.html

We have not tested all of them please raise a issue if there is one that does not work. We will work on documenting models and configs. Please document the model, size, and config you used to deploy where you ran into issues.

Disclaimer

mlflow-extensions is not developed, endorsed not supported by Databricks. It is provided as-is; no warranty is derived from using this package. For more details, please refer to the license.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.15.0

Sep 24, 2024

0.14.0

Sep 23, 2024

0.13.5

Sep 20, 2024

0.13.4

Sep 20, 2024

0.13.3

Sep 20, 2024

0.13.2

Sep 20, 2024

0.13.1

Sep 20, 2024

0.13.0

Sep 20, 2024

0.12.0

Sep 12, 2024

0.11.3

Sep 12, 2024

0.11.2

Sep 12, 2024

0.11.1

Sep 12, 2024

0.11.0

Sep 11, 2024

0.10.0

Sep 9, 2024

0.9.0

Sep 8, 2024

0.8.0

Sep 8, 2024

0.7.3

Sep 8, 2024

0.7.2

Sep 8, 2024

0.7.1

Sep 8, 2024

0.7.0

Sep 6, 2024

0.6.0

Sep 4, 2024

0.5.2

Sep 3, 2024

0.5.1

Sep 3, 2024

0.5.0

Sep 3, 2024

0.4.0

Aug 31, 2024

0.3.0

Aug 30, 2024

0.2.0

Aug 30, 2024

0.1.5

Aug 30, 2024

0.1.4

Aug 30, 2024

This version

0.1.3

Aug 29, 2024

0.1.2

Aug 29, 2024

0.1.1

Aug 29, 2024

0.1.0

Aug 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow-extensions-0.1.3.tar.gz (17.5 kB view hashes)

Uploaded Aug 29, 2024 Source

Built Distribution

mlflow_extensions-0.1.3-py3-none-any.whl (14.8 kB view hashes)

Uploaded Aug 29, 2024 Python 3

Hashes for mlflow-extensions-0.1.3.tar.gz

Hashes for mlflow-extensions-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`ef37a6e4efefa923217dfc37439e9c1176cad841686ebfde8f317b2ce18e2b5f`
MD5	`4d3508ab8f2ae571a0926e11497990a4`
BLAKE2b-256	`98358a9a78d08f9f909e283acf7486cf80711c728b9baadb9733e8d2e7203045`

Hashes for mlflow_extensions-0.1.3-py3-none-any.whl

Hashes for mlflow_extensions-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df8d09dbaecf7b533db49aba5fa5957f8e3cb16aac6e54d3358a7d77adf19616`
MD5	`184d954b8ae6876387a1495331b61a98`
BLAKE2b-256	`61415c7ff3f3c88f978c8c3c4f76ed961e428c23540a0d3a7fa2dfce996dfdf3`