Content safety evaluation tool - packaged by NVIDIA

These details have not been verified by PyPI

Development Status
- 5 - Production/Stable
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

NVIDIA NeMo Evaluator

The goal of NVIDIA NeMo Evaluator is to advance and refine state-of-the-art methodologies for model evaluation, and deliver them as modular evaluation packages (evaluation containers and pip wheels) that teams can use as standardized building blocks.

Quick Start Guide

NVIDIA NeMo Evaluator provides you with evaluation clients that are specifically built to evaluate model endpoints using our Standard API.

Prerequisites

Important: Both the model under test and the judge model must be deployed by the user locally before running evaluations.

Launching an Evaluation for an LLM

Install the package:
```
pip install nvidia-safety-harness
```
Deploy your models locally:
- Deploy the model you want to evaluate (model under test), e.g. on http://localhost:8000
- Deploy the appropriate judge model for your evaluation type, e.g. on http://localhost:8001 (see Judge Configuration)
- Both models should be accessible via HTTP API endpoints
Authenticate with Hugging Face: You need to authenticate to the Hugging Face Hub as some datasets or models might need to be downloaded during evaluation.
```
huggingface-cli login
```

List the available evaluations:

$ nemo-evaluator ls
safety_eval: 
  * aegis_v2
  * aegis_v2_reasoning
  * compliance
  * wildguard

(Optional) Set API keys for the model under test endpoint and the judge model endpoint, if they are protected:
```
export MUT_API_KEY="your_api_key_here"
export JUDGE_API_KEY="your_api_key_here"
```

Run the evaluation:

 nemo-evaluator run_eval \
 --model_id "meta/llama-4-maverick-17b-128e-instruct" \
 --model_url http://localhost:8000/v1 \
 --model_type chat \
 --api_key_name MUT_API_KEY \
 --output_dir /workspace/results \
 --eval_type aegis_v2 \
 --overrides="config.params.extra.judge.url=http://localhost:8001/v1"

Gather the results:
```
cat /workspace/results/results.yml
```

CLI Specification

Required flags:
- --eval_type <string>: The type of evaluation to perform
- --model_id <string>: The name or identifier of the model under test to evaluate
- --model_url <url>: The API endpoint where the model under test is accessible
- --model_type <string>: The type of the model under test to evaluate, currently either "chat", "completions", or "vlm"
- --output_dir <directory>: The directory to use as the working directory for the evaluation. The results, including the results.yml output file, will be saved here. Make sure to use the absolute path
Required overrides:
- config.params.extra.judge.url: URL for the judge model endpoint
Optional flags:
- --api_key_name <string>: The name of the environment variable that stores the bearer token for the model under test API, if authentication is required (specify as "MUT_API_KEY" if needed)
- --run_config <path>: Specifies the path to a YAML file containing the evaluation definition
- --dry_run: Allows you to print the final configuration and command without executing the evaluation

Configuring Evaluations via YAML

Evaluations in NVIDIA NeMo Evaluator are configured using YAML files that define the parameters and settings required for the evaluation process. These configuration files follow a standard API which ensures consistency across evaluations.

Example of a YAML configuration:

config:
  type: aegis_v2
  params:
    limit_samples: 10
    extra:
      judge:
        url: http://localhost:8001/v1
target:
  api_endpoint:
    model_id: meta/llama-4-maverick-17b-128e-instruct
    type: chat
    url: http://localhost:8000/v1

The priority of overrides is as follows:

Command-line arguments
User configuration (as seen above)
Task defaults (defined per task type)
Framework defaults

Example:

nemo-evaluator run_eval \
    --run_config config.yaml \
    --api_key_name MUT_API_KEY \
    --output_dir /workspace/results

With --dry_run, the following configuration is printed to the console:

nemo-evaluator run_eval \
    --run_config config.yaml \
    --api_key_name MUT_API_KEY \
    --output_dir /workspace/results \
    --dry_run

Output:

Rendered config:

command: '{% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}}  &&
  {% endif %} {% if config.params.extra.judge.api_key is not none %}export JUDGE_API_KEY=${{config.params.extra.judge.api_key}}
  && {% endif %} safety-eval  --model-name  {{target.api_endpoint.model_id}} --model-url
  {{target.api_endpoint.url}} --model-type {{target.api_endpoint.type}}  --judge-url  {{config.params.extra.judge.url}}   --results-dir
  {{config.output_dir}}   --eval {{config.params.task}}  --mut-inference-params max_tokens={{config.params.max_new_tokens}},temperature={{config.params.temperature}},top_p={{config.params.top_p}},timeout={{config.params.request_timeout}},concurrency={{config.params.parallelism}},retries={{config.params.max_retries}}
  --judge-inference-params concurrency={{config.params.extra.judge.parallelism}},retries={{config.params.max_retries}}  {%
  if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}
  {% endif %} {% if config.params.extra.judge.model_id is not none %} --judge-model-name
  {{config.params.extra.judge.model_id}} {% endif %}'
framework_name: safety_eval
pkg_name: safety_eval
config:
  output_dir: /workspace/results
  params:
    limit_samples: 10
    max_new_tokens: 6144
    max_retries: 5
    parallelism: 8
    task: aegis_v2
    temperature: 0.6
    request_timeout: 30
    top_p: 0.95
    extra:
      judge:
        url: http://localhost:8001/v1
        model_id: null
        api_key: null
        parallelism: 32
        request_timeout: 60
        max_retries: 16
  supported_endpoint_types:
  - chat
  - completions
  type: aegis_v2
target:
  api_endpoint:
    api_key_name: MUT_API_KEY
    model_id: meta/llama-4-maverick-17b-128e-instruct
    stream: false
    type: chat
    url: http://localhost:8000/v1


Rendered command:

export API_KEY=$MUT_API_KEY  &&   safety-eval  --model-name  meta/llama-4-maverick-17b-128e-instruct --model-url http://localhost:8000/v1 --model-type chat  --judge-url  http://localhost:8001/v1   --results-dir /workspace/results   --eval aegis_v2  --mut-inference-params max_tokens=6144,temperature=0.6,top_p=0.95,timeout=30,concurrency=8,retries=5 --judge-inference-params concurrency=32,retries=5   --limit 10

Tasks Configuration

Each evaluation requires its own judge model to be deployed by the user, e.g. on http://localhost:8001.

Aegis_v2

Judge Model: llama-3.1-nemoguard-8b-content-safety

Deployment: Follow NIM deployment guide
Endpoint Type: Completion endpoint
Tokenizer: meta-llama/Llama-3.1-8B-Instruct (ensure Hugging Face access)

Please note that if you evaluate reasoning model with aegis_v2, you should use aegis_v2_reasoning that is going to evaluate responses with and without thinking traces. No reasoning adapter should be turned on for that evaluation.

Wildguard

Judge Model: allenai/wildguard

Deployment: Use vLLM version v0.8.5
Endpoint Type: Completion endpoint
Tokenizer: mistralai/Mistral-7B-v0.3 (base model tokenizer)

Command:

docker run -it --gpus all -p 8001:8000 vllm/vllm-openai:v0.8.5 --model allenai/wildguard

Compliance

This automated workflow assesses LLM compliance according to specified policies.

Compliance integrity evaluation reads policy yaml file provided in config.params.extra.policy argument into a list of rules. An LLM Judge scores pairs of prompt taken from the dataset and model response against each rule. Configure the LLM judge by providing config.params.extra.judge.model_id, config.params.extra.judge.api_key, and config.params.extra.judge.url. You can use an arbitrary OpenAI-compatible endpoint for the judge.

Exemplary evaluation command (Please note: this example uses a small model for the judge to get you started. Consider using a larger model for judging):

nemo-evaluator run_eval --eval_type compliance \
    --model_id meta/llama-3.1-8b-instruct \
    --model_type chat \
    --model_url https://integrate.api.nvidia.com/v1/chat/completions \
    --api_key_name NVIDIA_API_KEY \
    --output_dir /results \
    --overrides "config.params.extra.judge.model_id=meta/llama-3.1-8b-instruct,config.params.extra.judge.url=https://integrate.api.nvidia.com/v1/chat/completions,config.params.extra.dataset=/workspace/compliance_prompts.csv,config.params.extra.policy=/workspace/policy_sec15.yaml,config.params.extra.judge.api_key=NVIDIA_API_KEY,config.params.parallelism=4,config.params.extra.judge.parallelism=2"

Input format

The policy (provided in config.params.extra.policy) should follow the following yaml format:

sections:
- name: 1. Section One
  rules:
  - id: S1.1
    definition: Definition of Rule S1.1
    examples: []
  - id: S1.2
    definition: Definition of Rule S1.2
    examples: []
    # Other rules in the section "1. Section One" follow
- name: 2. Section Two
  rules:
  - id: S2.2
    definition: Definition of Rule S1.2
    examples: 
    - Avoid modern slang (e.g., 'cool,' 'awesome,' 'vibe').
    - Avoid business jargon (e.g., 'leverage,' 'synergy').
    - Avoid technical/AI-specific language (e.g., 'database,' 'algorithm,' 'process,'
      'data').
    # Other rules in the section "2. Section Two" follow
  # Other sections follows

Whereas the dataset (provided in config.params.extra.dataset) should be either a csv file containing a prompt column or jsonl file where each object has prompt field.

For more examples that include real policy and dataset please refer to NeMo-Evaluator examples

The evaluation generates the following artifacts:

visualizations
- heatmap.png
- radar_chart.png
reports:
- compliance_report.md
- metrics.json

Project details

These details have not been verified by PyPI

Development Status
- 5 - Production/Stable
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

26.3

Mar 16, 2026

26.1

Feb 27, 2026

25.11

Dec 4, 2025

25.10

Oct 31, 2025

25.9.1

Oct 23, 2025

25.9

Oct 3, 2025

25.8.1

Sep 16, 2025

25.8

Sep 4, 2025

25.7.3

Aug 21, 2025

25.7.1

Aug 5, 2025

25.6.1

Jul 8, 2025

25.5

Jun 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nvidia_safety_harness-26.3-py3-none-any.whl (39.4 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file nvidia_safety_harness-26.3-py3-none-any.whl.

File metadata

Download URL: nvidia_safety_harness-26.3-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 39.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nvidia_safety_harness-26.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cd562316bfcdb522f26acaa76dc088baaa1927279d83170f5eae33ba8288544`
MD5	`020047e73ed2b1e6345ca283c37e6f5a`
BLAKE2b-256	`c5246769c3ef29f73f639882aef8c50bad7288e96f18357167b7397cd958283b`

See more details on using hashes here.

nvidia-safety-harness 26.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers