Skip to main content

Llama Stack Remote Eval Provider for TrustyAI LM-Eval

Project description

llama-stack-provider-lmeval

Llama Stack Remote Eval Provider for TrustyAI LM-Eval

About

This repository implements TrustyAI's LM-Eval as an out-of-tree Llama Stack remote provider.

It also includes an end-to-end instructions demonstratring how one can use LM-Eval on LLama Stack to run benchmark evaluations over DK-Bench on a deployed Phi-3-mini-4k-instruct model via OpenShift.

Use

Prerequsites

  • Admin access to an OpenShift cluster with RHOAI installed
  • Installation of uv
  • Installation of oc cli tool
  • Installation of llama stack cli tool
  1. Clone this repository

    git clone https://github.com/trustyai-explainability/llama-stack-provider-lmeval.git
    
  2. Set llama-stack-provider-lmeval/demo as your working directory.

    cd llama-stack-provider-lmeval/demo
    
  3. Deploy microsoft/Phi-3-mini-4k-instruct on vLLM Serving Runtime

    a. Create a namespace with a name of your choice

    TEST_NS=<NAMESPACE>
    oc create ns $TEST_NS
    oc get ns $TEST_NS
    

    b. Deploy the model via vLLM

    oc apply -k resources/kustomization.yaml
    
  4. Before continuing, preform a sanity check to make sure the model was sucessfully deployed

    oc get pods | grep "predictor"
    

    Expected output:

    phi-3-predictor-00002-deployment-794fb6b4b-clhj7   3/3     Running   0          5h55m
    
  5. Retrive the model route

    VLLM_URL=$(oc get $(oc get ksvc -o name | grep predictor) --template={{.status.url}})
    
  6. Create and activate a virtual enviornment

    uv venv .llamastack-venv
    
    source .llamastack-venv/bin/activate
    
  7. Install the required libraries

    uv pip install -e .
    
  8. In the run.yaml, make the following changes:

    a. Replace the remote::vllm url

    providers:
        inference:
        - provider_id: vllm-0
            provider_type: remote::vllm
            config:
            url: ${env.VLLM_URL:https://phi-3-predictor-llama-test.apps.rosa.p2i7w2k6p6w7t7e.3emk.p3.openshiftapps.com/v1/completions}
    

    b. Replace the remote::lmeval base_url and namespace

    - provider_id: lmeval-1
        provider_type: remote::lmeval
        config:
            use_k8s: True
            base_url: https://vllm-test.apps.rosa.p2i7w2k6p6w7t7e.3emk.p3.openshiftapps.com/v1/completions
            namespace: "llama-test"
    
  9. Start the llama stack server in a virtual enviornment

    llama stack run ../run.yaml --image-type venv
    

    Expected output:

    INFO:     Application startup complete.
    INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
    
  10. Navigate to demo.ipynb to run evaluation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_stack_provider_lmeval-0.1.1.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_stack_provider_lmeval-0.1.1-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file llama_stack_provider_lmeval-0.1.1.tar.gz.

File metadata

File hashes

Hashes for llama_stack_provider_lmeval-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3d9f32a373fac49cce2bdc115d668de287820d0209f65208b8af8a8d9b4e7052
MD5 bb988430cd588c6a7d52c1724923d895
BLAKE2b-256 3f89a13241d7c0d09e4e2544ecf8598c96058044f2b3f01354f8419d426190b6

See more details on using hashes here.

File details

Details for the file llama_stack_provider_lmeval-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_stack_provider_lmeval-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 04de211f19fce96aeb626ce046def01a831b64e43f69025d5f9796a565ef3b50
MD5 a2b093203ca28c4111d3af3d6156e217
BLAKE2b-256 80e56f16dd7a80db9625f2590ef8cdd7c12d555be92b3c184b90884511c764ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page