Skip to main content

Content safety evaluation tool - packaged by NVIDIA

Project description

NVIDIA NeMo Evaluator

The goal of NVIDIA NeMo Evaluator is to advance and refine state-of-the-art methodologies for model evaluation, and deliver them as modular evaluation packages (evaluation containers and pip wheels) that teams can use as standardized building blocks.

Quick Start Guide

NVIDIA NeMo Evaluator provides you with evaluation clients that are specifically built to evaluate model endpoints using our Standard API.

Prerequisites

Important: Both the model under test and the judge model must be deployed by the user locally before running evaluations.

Launching an Evaluation for an LLM

  1. Install the package:

    pip install nvidia-safety-harness
    
  2. Deploy your models locally:

    • Deploy the model you want to evaluate (model under test), e.g. on http://localhost:8000
    • Deploy the appropriate judge model for your evaluation type, e.g. on http://localhost:8001 (see Judge Configuration)
    • Both models should be accessible via HTTP API endpoints
  3. Authenticate with Hugging Face: You need to authenticate to the Hugging Face Hub as some datasets or models might need to be downloaded during evaluation.

    huggingface-cli login
    
  4. List the available evaluations:

    $ nemo-evaluator ls
    safety_eval: 
      * aegis_v2
      * aegis_v2_reasoning
      * compliance
      * wildguard
    
  5. (Optional) Set API keys for the model under test endpoint and the judge model endpoint, if they are protected:

    export MUT_API_KEY="your_api_key_here"
    export JUDGE_API_KEY="your_api_key_here"
    
  6. Run the evaluation:

     nemo-evaluator run_eval \
     --model_id "meta/llama-4-maverick-17b-128e-instruct" \
     --model_url http://localhost:8000/v1 \
     --model_type chat \
     --api_key_name MUT_API_KEY \
     --output_dir /workspace/results \
     --eval_type aegis_v2 \
     --overrides="config.params.extra.judge.url=http://localhost:8001/v1"
    
  7. Gather the results:

    cat /workspace/results/results.yml
    

CLI Specification

  1. Required flags:

    • --eval_type <string>: The type of evaluation to perform
    • --model_id <string>: The name or identifier of the model under test to evaluate
    • --model_url <url>: The API endpoint where the model under test is accessible
    • --model_type <string>: The type of the model under test to evaluate, currently either "chat", "completions", or "vlm"
    • --output_dir <directory>: The directory to use as the working directory for the evaluation. The results, including the results.yml output file, will be saved here. Make sure to use the absolute path
  2. Required overrides:

    • config.params.extra.judge.url: URL for the judge model endpoint
  3. Optional flags:

    • --api_key_name <string>: The name of the environment variable that stores the bearer token for the model under test API, if authentication is required (specify as "MUT_API_KEY" if needed)
    • --run_config <path>: Specifies the path to a YAML file containing the evaluation definition
    • --dry_run: Allows you to print the final configuration and command without executing the evaluation

Configuring Evaluations via YAML

Evaluations in NVIDIA NeMo Evaluator are configured using YAML files that define the parameters and settings required for the evaluation process. These configuration files follow a standard API which ensures consistency across evaluations.

Example of a YAML configuration:

config:
  type: aegis_v2
  params:
    limit_samples: 10
    extra:
      judge:
        url: http://localhost:8001/v1
target:
  api_endpoint:
    model_id: meta/llama-4-maverick-17b-128e-instruct
    type: chat
    url: http://localhost:8000/v1

The priority of overrides is as follows:

  1. Command-line arguments
  2. User configuration (as seen above)
  3. Task defaults (defined per task type)
  4. Framework defaults

Example:

nemo-evaluator run_eval \
    --run_config config.yaml \
    --api_key_name MUT_API_KEY \
    --output_dir /workspace/results

With --dry_run, the following configuration is printed to the console:

nemo-evaluator run_eval \
    --run_config config.yaml \
    --api_key_name MUT_API_KEY \
    --output_dir /workspace/results \
    --dry_run

Output:

Rendered config:

command: '{% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}}  &&
  {% endif %} {% if config.params.extra.judge.api_key is not none %}export JUDGE_API_KEY=${{config.params.extra.judge.api_key}}
  && {% endif %} safety-eval  --model-name  {{target.api_endpoint.model_id}} --model-url
  {{target.api_endpoint.url}} --model-type {{target.api_endpoint.type}}  --judge-url  {{config.params.extra.judge.url}}   --results-dir
  {{config.output_dir}}   --eval {{config.params.task}}  --mut-inference-params max_tokens={{config.params.max_new_tokens}},temperature={{config.params.temperature}},top_p={{config.params.top_p}},timeout={{config.params.request_timeout}},concurrency={{config.params.parallelism}},retries={{config.params.max_retries}}
  --judge-inference-params concurrency={{config.params.extra.judge.parallelism}},retries={{config.params.max_retries}}  {%
  if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}
  {% endif %} {% if config.params.extra.judge.model_id is not none %} --judge-model-name
  {{config.params.extra.judge.model_id}} {% endif %}'
framework_name: safety_eval
pkg_name: safety_eval
config:
  output_dir: /workspace/results
  params:
    limit_samples: 10
    max_new_tokens: 6144
    max_retries: 5
    parallelism: 8
    task: aegis_v2
    temperature: 0.6
    request_timeout: 30
    top_p: 0.95
    extra:
      judge:
        url: http://localhost:8001/v1
        model_id: null
        api_key: null
        parallelism: 32
        request_timeout: 60
        max_retries: 16
  supported_endpoint_types:
  - chat
  - completions
  type: aegis_v2
target:
  api_endpoint:
    api_key_name: MUT_API_KEY
    model_id: meta/llama-4-maverick-17b-128e-instruct
    stream: false
    type: chat
    url: http://localhost:8000/v1


Rendered command:

export API_KEY=$MUT_API_KEY  &&   safety-eval  --model-name  meta/llama-4-maverick-17b-128e-instruct --model-url http://localhost:8000/v1 --model-type chat  --judge-url  http://localhost:8001/v1   --results-dir /workspace/results   --eval aegis_v2  --mut-inference-params max_tokens=6144,temperature=0.6,top_p=0.95,timeout=30,concurrency=8,retries=5 --judge-inference-params concurrency=32,retries=5   --limit 10

Tasks Configuration

Each evaluation requires its own judge model to be deployed by the user, e.g. on http://localhost:8001.

Aegis_v2

Judge Model: llama-3.1-nemoguard-8b-content-safety

  • Deployment: Follow NIM deployment guide
  • Endpoint Type: Completion endpoint
  • Tokenizer: meta-llama/Llama-3.1-8B-Instruct (ensure Hugging Face access)

Please note that if you evaluate reasoning model with aegis_v2, you should use aegis_v2_reasoning that is going to evaluate responses with and without thinking traces. No reasoning adapter should be turned on for that evaluation.

Wildguard

Judge Model: allenai/wildguard

  • Deployment: Use vLLM version v0.8.5
  • Endpoint Type: Completion endpoint
  • Tokenizer: mistralai/Mistral-7B-v0.3 (base model tokenizer)
  • Command:
    docker run -it --gpus all -p 8001:8000 vllm/vllm-openai:v0.8.5 --model allenai/wildguard
    

Compliance

This automated workflow assesses LLM compliance according to specified policies.

Compliance integrity evaluation reads policy yaml file provided in config.params.extra.policy argument into a list of rules. An LLM Judge scores pairs of prompt taken from the dataset and model response against each rule. Configure the LLM judge by providing config.params.extra.judge.model_id, config.params.extra.judge.api_key, and config.params.extra.judge.url. You can use an arbitrary OpenAI-compatible endpoint for the judge.

Exemplary evaluation command (Please note: this example uses a small model for the judge to get you started. Consider using a larger model for judging):

nemo-evaluator run_eval --eval_type compliance \
    --model_id meta/llama-3.1-8b-instruct \
    --model_type chat \
    --model_url https://integrate.api.nvidia.com/v1/chat/completions \
    --api_key_name NVIDIA_API_KEY \
    --output_dir /results \
    --overrides "config.params.extra.judge.model_id=meta/llama-3.1-8b-instruct,config.params.extra.judge.url=https://integrate.api.nvidia.com/v1/chat/completions,config.params.extra.dataset=/workspace/compliance_prompts.csv,config.params.extra.policy=/workspace/policy_sec15.yaml,config.params.extra.judge.api_key=NVIDIA_API_KEY,config.params.parallelism=4,config.params.extra.judge.parallelism=2"

Input format

The policy (provided in config.params.extra.policy) should follow the following yaml format:

sections:
- name: 1. Section One
  rules:
  - id: S1.1
    definition: Definition of Rule S1.1
    examples: []
  - id: S1.2
    definition: Definition of Rule S1.2
    examples: []
    # Other rules in the section "1. Section One" follow
- name: 2. Section Two
  rules:
  - id: S2.2
    definition: Definition of Rule S1.2
    examples: 
    - Avoid modern slang (e.g., 'cool,' 'awesome,' 'vibe').
    - Avoid business jargon (e.g., 'leverage,' 'synergy').
    - Avoid technical/AI-specific language (e.g., 'database,' 'algorithm,' 'process,'
      'data').
    # Other rules in the section "2. Section Two" follow
  # Other sections follows

Whereas the dataset (provided in config.params.extra.dataset) should be either a csv file containing a prompt column or jsonl file where each object has prompt field.

For more examples that include real policy and dataset please refer to NeMo-Evaluator examples

The evaluation generates the following artifacts:

  • visualizations
    • heatmap.png
    • radar_chart.png
  • reports:
    • compliance_report.md
    • metrics.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nvidia_safety_harness-26.3-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file nvidia_safety_harness-26.3-py3-none-any.whl.

File metadata

File hashes

Hashes for nvidia_safety_harness-26.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8cd562316bfcdb522f26acaa76dc088baaa1927279d83170f5eae33ba8288544
MD5 020047e73ed2b1e6345ca283c37e6f5a
BLAKE2b-256 c5246769c3ef29f73f639882aef8c50bad7288e96f18357167b7397cd958283b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page