Skip to main content

Llama Stack Remote Eval Provider for TrustyAI LM-Eval

Project description

llama-stack-provider-lmeval

Llama Stack Remote Eval Provider for TrustyAI LM-Eval

About

This repository implements TrustyAI's LM-Eval as an out-of-tree Llama Stack remote provider.

It also includes an end-to-end instructions demonstratring how one can use LM-Eval on LLama Stack to run benchmark evaluations over DK-Bench on a deployed Phi-3-mini-4k-instruct model via OpenShift.

Use

Prerequsites

  • Admin access to an OpenShift cluster with RHOAI installed
  • Installation of uv
  • Installation of oc cli tool
  • Installation of llama stack cli tool
  1. Clone this repository

    git clone https://github.com/trustyai-explainability/llama-stack-provider-lmeval.git
    
  2. Set llama-stack-provider-lmeval/demo as your working directory.

    cd llama-stack-provider-lmeval/demo
    
  3. Deploy microsoft/Phi-3-mini-4k-instruct on vLLM Serving Runtime

    a. Create a namespace with a name of your choice

    TEST_NS=<NAMESPACE>
    oc create ns $TEST_NS
    oc get ns $TEST_NS
    

    b. Deploy the model via vLLM

    oc apply -k resources/kustomization.yaml
    
  4. Before continuing, preform a sanity check to make sure the model was sucessfully deployed

    oc get pods | grep "predictor"
    

    Expected output:

    phi-3-predictor-00002-deployment-794fb6b4b-clhj7   3/3     Running   0          5h55m
    
  5. Retrive the model route

    VLLM_URL=$(oc get $(oc get ksvc -o name | grep predictor) --template={{.status.url}})
    
  6. Create and activate a virtual enviornment

    uv venv .llamastack-venv
    
    source .llamastack-venv/bin/activate
    
  7. Install the required libraries

    uv pip install -e .
    
  8. In the run.yaml, make the following changes:

    a. Replace the remote::vllm url

    providers:
        inference:
        - provider_id: vllm-0
            provider_type: remote::vllm
            config:
            url: ${env.VLLM_URL:https://phi-3-predictor-llama-test.apps.rosa.p2i7w2k6p6w7t7e.3emk.p3.openshiftapps.com/v1/completions}
    

    b. Replace the remote::lmeval base_url and namespace

    - provider_id: lmeval-1
        provider_type: remote::lmeval
        config:
            use_k8s: True
            base_url: https://vllm-test.apps.rosa.p2i7w2k6p6w7t7e.3emk.p3.openshiftapps.com/v1/completions
            namespace: "llama-test"
    
  9. Start the llama stack server in a virtual enviornment

    llama stack run run.yaml --image-type venv
    

    Expected output:

    INFO:     Application startup complete.
    INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
    
  10. Navigate to demo.ipynb to run evaluation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_stack_provider_lmeval-0.1.2.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_stack_provider_lmeval-0.1.2-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file llama_stack_provider_lmeval-0.1.2.tar.gz.

File metadata

File hashes

Hashes for llama_stack_provider_lmeval-0.1.2.tar.gz
Algorithm Hash digest
SHA256 023bafc613e220ad23b32cc2b31cd6ade82d822e3ff85386d80d6b6f217074b2
MD5 12be70cb925c6fb1e1c11eedd6b50962
BLAKE2b-256 e2e6be621247adbdfd593d2a2d4931b97e3d206fc99c6673729557525a0681e4

See more details on using hashes here.

File details

Details for the file llama_stack_provider_lmeval-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_stack_provider_lmeval-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 290e5ee0fdc65a01b576624c448d12bba5bf74ab3298f68e6b7d8d4fb2beb044
MD5 ef54762e1f048494ef64fa4fdaf3cf04
BLAKE2b-256 42e02d948d65b2875621d6ce89307a99f9373713ddf37cb94a187c8e33f6c53c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page