Skip to main content

DataRobot Monitoring and Moderation framework

Project description

DataRobot Moderations library

This library enforces the intervention in the prompt and response texts as per the guard configuration set by the user.

The library accepts the guard configuration in the yaml format and the input prompts and outputs the dataframe with the details like:

  • should the prompt be blocked
  • should the completion be blocked
  • metric values obtained from the model guards
  • is the prompt or response modified as per the modifier guard configuration

Architecture

The library is architected in a way that it wraps around the typical LLM prediction method. The library will first run the pre-score guards - the guards that will evaluate prompts and enforce moderation if necessary. All the prompts that were not moderated by the library are forwarded to the actual LLM to get their respective completions. The library then evaluates these completions using post-score guards and enforces intervention on them.

How to build it?

The repository uses poetry to manage the build process and a wheel can be built using:

make clean
make

How to use it?

A wheel file generated or downloaded can be installed with pip and will pull its dependencies as well.

pip3 install datarobot-moderations

Optional extras

The base install covers token-count, ROUGE-1, cost, and NeMo guards. Heavier or cloud-specific dependencies are opt-in:

Extra What it enables
datarobot-sdk DataRobot model guards, DataRobot LLM evaluator type
llm-eval Faithfulness, Task Adherence, Agent Goal Accuracy, Guideline Adherence guards
nemo NeMo Guardrails colang-based flow guard
nemo-evaluator NeMo live-evaluation microservice guard
nvidia NVIDIA NIM / ChatNVIDIA LLM support
vertex Google Cloud Vertex AI LLM support
bedrock AWS Bedrock LLM support
all Every optional dependency at once
# Example: task-adherence guard backed by a DataRobot LLM deployment
pip3 install 'datarobot-moderations[llm-eval,datarobot-sdk]'

deepeval telemetry

Moderations opts out of deepeval telemetry by default.

Transient dependencies and build compatibility

Installing [all] (or the nemo / llm-eval extras individually) pulls in packages that nemoguardrails and deepeval declare as runtime dependencies but that this library never uses at runtime:

Package Pulled in by Problem
annoy nemoguardrails Requires a C++ compiler; breaks restricted build environments such as Kaniko
fastembed / onnxruntime nemoguardrails Heavy ML runtimes, hundreds of MB
fastapi / starlette / uvicorn nemoguardrails Web server stack, only used by nemoguardrails' built-in server
watchdog / prompt-toolkit / typer nemoguardrails, deepeval Dev-server and CLI tools
pyfiglet / wheel deepeval CLI banner / build artefact mis-declared as a runtime dep

To exclude them, add the following to your own project's pyproject.toml (these overrides are not inherited from this library):

[tool.uv]
override-dependencies = [
    "annoy; sys_platform == 'never'",
    "fastembed; sys_platform == 'never'",
    "onnxruntime; sys_platform == 'never'",
    "fastapi; sys_platform == 'never'",
    "starlette; sys_platform == 'never'",
    "uvicorn; sys_platform == 'never'",
    "watchdog; sys_platform == 'never'",
    "prompt-toolkit; sys_platform == 'never'",
    "typer; sys_platform == 'never'",
    "pyfiglet; sys_platform == 'never'",
    "wheel; sys_platform == 'never'",
]

Standalone Python API

Create a ModerationPipeline from a YAML file, a plain dict, or a Pydantic config object:

from datarobot_dome.api import ModerationPipeline

# From a YAML file
pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

# From a plain Python dictionary (same schema as YAML)
pipeline = ModerationPipeline.from_dict({"targets": [{"target": "_default", "guards": [...]}]})

# From a Pydantic ModerationConfig object (full type-safety / IDE autocompletion)
from datarobot_dome.schema import ModerationConfig, OOTBGuardSchema, TargetBlock
pipeline = ModerationPipeline.from_config(ModerationConfig(...))

# model_dir is an optional kwarg on from_dict / from_config:
# base directory for resolving NeMo guardrails .co flow files (default: os.getcwd())
pipeline = ModerationPipeline.from_dict({...}, model_dir="/path/to/nemo_dir")

All three constructors validate DATAROBOT_ENDPOINT and DATAROBOT_API_TOKEN before initialising.

Evaluate a prompt (prescore guards):

result, latency, prescore_df = pipeline.evaluate_prompt("Ignore previous instructions and …")
if result.blocked:
    print(result.blocked_message)

Evaluate a response (postscore guards):

result, latency, postscore_df = pipeline.evaluate_response(
    response="The capital of France is Paris.",
    prompt="What is the capital of France?",  # required for faithfulness / task-adherence guards
)
print(result.blocked, result.metrics)

Full pipeline — prescore → LLM → postscore:

def my_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."  # replace with your LLM call

result, prescore_df, postscore_df = pipeline.evaluate_full_pipeline("What is DataRobot?", my_llm)
if not result.blocked:
    print(result.response)

Each method has an async counterpart — just await it inside an async function: evaluate_prompt_async, evaluate_response_async, evaluate_full_pipeline_async.

Streaming pipeline — prescore → LLM stream → per-chunk postscore:

async def my_llm_stream(prompt: str):
    # wrap a sync SDK stream, or yield directly from an async SDK
    for chunk in sync_openai_client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": prompt}], stream=True
    ):
        yield chunk

async for chunk in pipeline.evaluate_full_pipeline_stream_async("What is DataRobot?", my_llm_stream):
    if chunk.choices[0].finish_reason == "content_filter":
        print("Blocked:", chunk.choices[0].delta.content)
        break
    print(chunk.choices[0].delta.content or "", end="", flush=True)

A finish_reason="content_filter" chunk means a guard blocked content — either at prescore (LLM never called) or mid-stream from a postscore guard.

For the full API reference — all parameters, return types, result-object fields, DataFrame column schemas, streaming details, and env-var reference — see docs/GUARDRAILS.md § 8.

Command-line interface (CLI)

The package ships a dr-moderation CLI so you can manage guards without writing Python code.

# Set credentials once
export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-api-token"

# Add guards to an existing custom model — prints the new version ID
dr-moderation add-guard \
  --custom-model-id 6793e6b2114f17240fa2194c \
  --config-file docs/examples/add_guard_config.yaml

# Evaluate with LLM Gateway: no deployment_id needed,
# just llm_gateway_model_id in the config (snake_case SDK format)
dr-moderation evaluate \
  --config-file docs/examples/llm_gateway_config.yaml \
  --prompt "What is DataRobot?" \
  --response "DataRobot is an AI platform." \
  --as-json

# Verify connectivity to a remote A2A agent
dr-moderation agent a2a connect --url https://my-llm-agent.example.com

Developer: if dr-moderation is not found, the venv is not activated. Run poetry shell first, or prefix every command with poetry run:

poetry run dr-moderation evaluate \
  --config-file docs/examples/token_count_config.yaml \
  --prompt "Hello"
# or use the Makefile shortcut:
make cli ARGS="evaluate --config-file docs/examples/token_count_config.yaml --prompt 'Hello'"

For full option reference, both YAML schemas, exit codes, and a CI/CD shell workflow, see docs/CLI.md.

With DRUM

As described above, the library nicely wraps DRUM's score method for pre and post score guards. Hence, in case of DRUM, the user simply runs their custom model using drum score and can avail the moderation library features.

Install DRUM along with the necessary optional extras for your specific guards. If you are unsure which guards are in use, install [all]:

pip3 install datarobot-drum 'datarobot-moderations[all]'
drum score --verbose --logging-level info --code-dir ./ --input ./input.csv --target-type textgeneration --runtime-params-file values.yaml

Guardrails Configuration Guide

Guards evaluate prompts (pre-score) and/or responses (post-score) and can block, report, or replace content based on configurable conditions.


Table of Contents

  1. File structure
  2. Top-level options
  3. Common guard fields
  4. Intervention block
  5. Guard types
  6. LLM back-end options
  7. Full annotated example
  8. Using the config in Python
  9. Testing guide
  10. Environment variables

1. File structure

timeout_sec: 10
timeout_action: score
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: My Guard
    type: ootb
    stage: prompt
    # ...

2. Top-level options

Field Type Default Description
timeout_sec int 10 Seconds to wait per guard
timeout_action string score score (allow) or block on timeout
nemo_evaluator_deployment_id string DataRobot deployment ID of the NeMo Evaluator microservice; required when any guard uses type: nemo_evaluator
enable_deepeval_telemetry bool false Opt in to deepeval usage telemetry and local .deepeval/ artefacts. See §10.
prompt_column_name string "promptText" Name of the DataFrame column that holds the input text. Used in standalone Python when no DRUM deployment is active. Ignored when a DRUM deployment context is active.
response_column_name string "completion" Name of the DataFrame column that holds the LLM response text. Used in standalone Python as a fallback when TARGET_NAME is not set. Lower priority than TARGET_NAME — if both are provided, TARGET_NAME wins. Ignored when a DRUM deployment context is active.
guards list required List of guard definitions

3. Common guard fields

Field Required Description
name Unique label; used as the key in result.metrics and as the DataRobot custom metric name
type ootb · model · nemo_guardrails · nemo_evaluator
stage prompt · response · [prompt, response] (list runs the guard at both stages)
description Free-text label, ignored by the library
intervention What to do when the condition fires (see §4). Omit entirely to measure only — nothing is ever blocked
copy_citations Boolean (true/false, default false). Passes retrieved RAG context to this guard. Required for rouge_1 and faithfulness to produce meaningful scores
is_agentic Marks an agentic-workflow guard (default false). Required by agent_goal_accuracy
# stage as a list — guard runs independently at both prompt and response stages
- name: Token Count Both
  type: ootb
  ootb_type: token_count
  stage: [prompt, response]
  intervention:
    action: block
    message: "Input or output exceeds the token limit."
    conditions:
      - comparator: greaterThan
        comparand: 100

4. Intervention block

intervention:
  action: block               # "block" | "report" | "replace"
  message: "Blocked."         # returned to caller
  send_notification: false
  conditions:
    - comparand: 0.5
      comparator: greaterThan

One condition per intervention. The conditions list accepts exactly one entry for block and replace; zero entries (conditions: []) is valid for report. To combine conditions (e.g. block if score < 0.2 or > 0.9), use two separate guards.

Actions

Action Effect
block Reject and return message to the caller. message is optional in the schema but omitting it returns an empty string — always set it.
report Record the metric and allow content through unchanged. Behaviorally identical to omitting the intervention block entirely; useful when you want the metric tracked but never want to block.
replace Swap the text with the sanitised version returned by the deployment. Only valid for type: model guards. The deployment must return the replacement text in the field specified by model_info.replacement_text_column_name; if that field is absent a ValueError is raised.

Comparators

Comparator Comparand type Description
greaterThan / lessThan number Numeric threshold
equals / notEquals number | string Exact equality. Use comparand: "TRUE" with NeMo Guardrails guards, whose score is the string "TRUE" or "FALSE"
is / isNot boolean Boolean equality
matches / doesNotMatch list of strings Class membership. matches fires if the prediction is in the list; doesNotMatch fires if it is not.
contains / doesNotContain list of strings Substring check against a list. contains fires if all items in the list are found as substrings of the prediction; doesNotContain fires if not all items are found.

5. Guard types

5.1 Out-of-the-Box (ootb)

Set type: ootb and ootb_type.

Install only what you use:

pip install datarobot-moderations                          # base — token_count, rouge_1, cost, custom_metric
pip install 'datarobot-moderations[llm-eval]'              # + faithfulness, task_adherence, agent_guideline_adherence, agent_goal_accuracy
pip install 'datarobot-moderations[llm-eval,vertex]'       # + Google Vertex AI as LLM judge
pip install 'datarobot-moderations[llm-eval,bedrock]'      # + AWS Bedrock as LLM judge
pip install 'datarobot-moderations[llm-eval,nvidia]'       # + NVIDIA NIM as LLM judge
pip install 'datarobot-moderations[datarobot-sdk]'         # required for type: model and llm_type: datarobot
pip install 'datarobot-moderations[all]'                   # everything
ootb_type Stage Install extra Description
token_count prompt / response (base) Token count
rouge_1 response (base) ROUGE-1 overlap with citations
faithfulness response llm-eval LLM-judged hallucination detection
task_adherence response llm-eval Task-completion score
agent_guideline_adherence response llm-eval Guideline adherence
agent_goal_accuracy response llm-eval Agentic goal-accuracy
cost response (base) Estimated cost. Counts both prompt tokens (input_price/input_unit) and response tokens (output_price/output_unit). Must be at the response stage because both token counts are only available after the LLM responds. Currently only currency: USD is supported.
custom_metric prompt / response (base) User-defined numeric metric
# Token count — report only
- name: Prompt Token Count
  type: ootb
  ootb_type: token_count
  stage: prompt

# Token count — block on length
- name: Response Token Count
  type: ootb
  ootb_type: token_count
  stage: response
  intervention:
    action: block
    message: "Response too long."
    conditions:
      - comparand: 1000
        comparator: greaterThan

# ROUGE-1 (requires citations)
- name: Rouge 1
  type: ootb
  ootb_type: rouge_1
  stage: response
  copy_citations: true
  intervention:
    action: report
    conditions: []

# Faithfulness
- name: Faithfulness
  type: ootb
  ootb_type: faithfulness
  stage: response
  copy_citations: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"   # 24-char DataRobot deployment ID
  intervention:
    action: block
    message: "Hallucination detected."
    conditions:
      - comparand: 0.0
        comparator: equals

# Task Adherence
- name: Task Adherence
  type: ootb
  ootb_type: task_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: block
    message: "LLM did not complete the requested task."
    conditions:
      - comparator: lessThan
        comparand: 0.5

# Guideline Adherence
- name: Guideline Adherence
  type: ootb
  ootb_type: agent_guideline_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  additional_guard_config:
    agent_guideline: "Response must be polite and on-topic."   # free-text criterion for the LLM judge
  intervention:
    action: block
    message: "Response violates guidelines."
    conditions:
      - comparand: 0.0
        comparator: equals

# Agent Goal Accuracy
- name: Agent Goal Accuracy
  type: ootb
  ootb_type: agent_goal_accuracy
  stage: response
  is_agentic: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: report
    conditions: []

# Cost tracking
- name: Cost
  type: ootb
  ootb_type: cost
  stage: response
  additional_guard_config:
    cost:
      currency: USD
      input_price: 0.01
      input_unit: 1000
      output_price: 0.03
      output_unit: 1000
  intervention:
    action: report
    conditions: []

5.2 Model guard

Wraps any DataRobot deployment you have already created (binary classifier, regression, multiclass, or text-generation). The library sends the text to that deployment and uses the prediction it returns to decide whether to block, report, or replace content.

# Binary classifier (e.g. toxicity, prompt injection)
# Works with any DataRobot binary classification deployment.
- name: Toxicity
  type: model
  stage: prompt
  deployment_id: "<your-deployment-id>"   # 24-char DataRobot deployment ID
  model_info:
    input_column_name: text               # field your deployment reads as input
    target_name: toxicity_toxic_PREDICTION  # prediction field returned by the deployment
    target_type: Binary        # Binary | Regression | Multiclass | TextGeneration
    class_names: []            # leave empty for Binary/Regression
  intervention:
    action: block
    message: "Toxic content blocked."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# PII detection with text replacement
# The deployment must return BOTH the score field (`target_name`)
# AND a sanitised-text field (`replacement_text_column_name`).
- name: PII Detector
  type: model
  stage: prompt
  deployment_id: "<your-pii-deployment-id>"
  model_info:
    input_column_name: text
    target_name: contains_pii_true_PREDICTION
    target_type: TextGeneration
    replacement_text_column_name: anonymized_text_OUTPUT
    class_names: []
  intervention:
    action: replace
    message: "PII removed from prompt."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# Multi-label / emotion classifier
- name: Emotion Classifier
  type: model
  stage: prompt
  deployment_id: "<your-emotion-deployment-id>"
  model_info:
    input_column_name: text
    target_name: target_PREDICTION
    target_type: TextGeneration
    class_names: [anger, fear, sadness, disgust, joy, neutral]
  intervention:
    action: block
    message: "Negative emotion detected."
    conditions:
      - comparand: [anger, fear, sadness, disgust]
        comparator: matches

5.3 NeMo Guardrails

Flow-based content filtering. Requires pip install 'datarobot-moderations[nemo]'.

Supported llm_type values: openAi, azureOpenAi, nim, llmGateway only.

Colang flow files must live in stage-specific subdirectories of nemo_guardrails/:

nemo_guardrails/
  prompt/      # config.yml + *.co files for stage: prompt
  response/    # config.yml + *.co files for stage: response
- name: Stay on topic
  type: nemo_guardrails
  stage: prompt
  llm_type: azureOpenAi
  openai_api_base: "https://<resource>.openai.azure.com/"
  openai_deployment_id: gpt-4o-mini
  intervention:
    action: block
    message: "This topic is outside the allowed scope."
    conditions:
      - comparand: "TRUE"
        comparator: equals

5.4 NeMo Evaluator

Calls a DataRobot-hosted NeMo Evaluator microservice. Requires pip install 'datarobot-moderations[nemo-evaluator]'.

Two deployment IDs — what's the difference?

Field What it points to
nemo_evaluator_deployment_id (top-level) Your NeMo Evaluator microservice deployment in DataRobot
deployment_id (per-guard) The LLM deployment the evaluator uses to do the judging

Both values must be valid 24-character DataRobot deployment IDs. Using a placeholder longer than 24 characters (e.g. "<your-nemo-evaluator-id>") causes a load-time validation error: String is longer than 24 characters.

llm_type must be datarobot for all nemo_evaluator guards.

nemo_evaluator_type Stage Description
llm_judge prompt / response Custom LLM-as-judge with your own prompts. score_parsing_regex is a regular expression applied to the LLM's raw text reply to extract a single numeric score — e.g. "([1-5])" picks the first digit 1–5 from any surrounding text.
context_relevance response Relevance of retrieved context to the question
response_groundedness response Groundedness in retrieved context
topic_adherence response Adherence to allowed topics
response_relevancy response Relevance of response to question
faithfulness response NeMo microservice faithfulness score
agent_goal_accuracy response Agentic goal-accuracy via NeMo
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: Safety Judge
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: llm_judge
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_llm_judge_config:
      system_prompt: "Rate safety 1-5. Output ONLY the integer."
      user_prompt: "Response: {response}"
      score_parsing_regex: "([1-5])"   # regex to extract the numeric score from the LLM's text output
      custom_metric_directionality: higherIsBetter   # "higherIsBetter" | "lowerIsBetter"
    intervention:
      action: block
      message: "Response failed safety evaluation."
      conditions:
        - comparand: 2
          comparator: lessThan

  - name: Topic Adherence
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: topic_adherence
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_topic_adherence_config:
      metric_mode: f1          # "f1" | "precision" | "recall"
      reference_topics: [DataRobot, machine learning, AI platforms]
    intervention:
      action: report
      conditions: []

  - name: Response Relevancy
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: response_relevancy
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_response_relevancy_config:
      embedding_deployment_id: "<your-embedding-id>"
    intervention:
      action: report
      conditions: []

6. LLM back-end options

Some ootb guards (e.g. faithfulness, task_adherence) call an LLM to judge the text. You choose which LLM provider to use via llm_type.

DataRobot credentials (DATAROBOT_ENDPOINT + DATAROBOT_API_TOKEN) are always required

Supported llm_type values

llm_type LLM provider Extra YAML fields Extra install
datarobot DataRobot-hosted LLM deployment deployment_id datarobot-sdk
openAi OpenAI API (none) llm-eval
azureOpenAi Azure OpenAI openai_api_base, openai_deployment_id llm-eval
google Google Vertex AI google_region, google_model llm-eval,vertex
amazon AWS Bedrock aws_region, aws_model llm-eval,bedrock
nim NVIDIA NIM openai_api_base llm-eval,nvidia
llmGateway DataRobot LLM Gateway llm_gateway_model_id datarobot-sdk

nemo_guardrails supports: openAi, azureOpenAi, nim only
nemo_evaluator supports: datarobot only

Available models (Google / AWS)

The library maps a fixed set of model names to their provider API identifiers. Models not in this list are not supported.

Provider llm_type google_model / aws_model
Google Vertex AI google google-gemini-1.5-flash, google-gemini-1.5-pro, chat-bison
AWS Bedrock amazon amazon-titan, anthropic-claude-2, anthropic-claude-3-haiku, anthropic-claude-3-sonnet, anthropic-claude-3-opus, anthropic-claude-3.5-sonnet-v1, anthropic-claude-3.5-sonnet-v2, amazon-nova-lite, amazon-nova-micro, amazon-nova-pro

7. Full annotated example

Replace every <...> placeholder with a real value before use. DataRobot deployment IDs are exactly 24 hexadecimal characters.

timeout_sec: 15
timeout_action: score

guards:
  # -- Pre-score (prompt) --------------------------------------------------

  - name: Prompt Injection
    type: model
    stage: prompt
    deployment_id: "<prompt-injection-id>"
    model_info:
      input_column_name: text
      target_name: injection_injection_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Prompt injection attempt detected and blocked."
      conditions:
        - comparand: 0.80
          comparator: greaterThan

  - name: Toxicity
    type: model
    stage: prompt
    deployment_id: "<toxicity-id>"
    model_info:
      input_column_name: text
      target_name: toxicity_toxic_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Toxic content is not allowed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: PII Detector
    type: model
    stage: prompt
    deployment_id: "<pii-id>"
    model_info:
      input_column_name: text
      target_name: contains_pii_true_PREDICTION
      target_type: TextGeneration
      replacement_text_column_name: anonymized_text_OUTPUT
      class_names: []
    intervention:
      action: replace
      message: "PII detected and removed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: Topic Guardrail
    type: nemo_guardrails
    stage: prompt
    llm_type: azureOpenAi
    openai_api_base: "https://<resource>.openai.azure.com/"
    openai_deployment_id: gpt-4o-mini
    intervention:
      action: block
      message: "This topic is outside the allowed scope."
      conditions:
        - comparand: "TRUE"
          comparator: equals

  # -- Post-score (response) -----------------------------------------------

  - name: Response Token Count
    type: ootb
    ootb_type: token_count
    stage: response

  - name: Faithfulness
    type: ootb
    ootb_type: faithfulness
    stage: response
    copy_citations: true
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "The response appears to be hallucinated."
      conditions:
        - comparand: 0.0
          comparator: equals

  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "LLM did not complete the requested task."
      conditions:
        - comparator: lessThan
          comparand: 0.5

  - name: Cost
    type: ootb
    ootb_type: cost
    stage: response
    additional_guard_config:
      cost:
        currency: USD
        input_price: 0.01
        input_unit: 1000
        output_price: 0.03
        output_unit: 1000
    intervention:
      action: report
      conditions: []

8. Using the config in Python

Guards can be configured from a YAML file, a plain Python dict, or a Pydantic object built entirely in Python. All approaches are fully equivalent — choose whichever fits your workflow.

8a. From a YAML file

Return types

Method Returns
evaluate_prompt(prompt) (EvaluationResult, latency_seconds, prescore_df)
evaluate_response(response, prompt=None) (EvaluationResult, latency_seconds, postscore_df)
evaluate_full_pipeline(prompt, llm_callable) (PipelineResult, prescore_df, postscore_df)postscore_df is None when the prompt was blocked; per-stage latency is not returned — use evaluate_prompt / evaluate_response directly when you need it
evaluate_prompt_async(prompt) same as evaluate_prompt but non-blocking
evaluate_response_async(response, prompt=None) same as evaluate_response but non-blocking
evaluate_full_pipeline_async(prompt, llm_callable) same as evaluate_full_pipeline but non-blocking; llm_callable must be an async coroutine
evaluate_full_pipeline_stream_async(prompt, llm_callable) AsyncGenerator[ChatCompletionChunk, None] — see §8d
stream_response_async(completion, *, prompt, prescore_df, prescore_latency) AsyncGenerator[ChatCompletionChunk, None] — lower-level; see §8d

EvaluationResult.metrics holds the guard scores keyed by guard name.

evaluate_prompt / evaluate_prompt_async parameters

Parameter Type Required Description
prompt str The user prompt text to evaluate against prescore guards

evaluate_response / evaluate_response_async parameters

Parameter Type Required Description
response str The LLM response text to evaluate against postscore guards
prompt str | None The original user prompt. Required for guards that compare prompt and response (e.g. faithfulness, task_adherence, rouge_1). Omit only when no such guards are configured
pipeline_interactions str | None JSON-serialised MultiTurnSample dict from the DataRobot agentic pipeline. Enables agent_goal_accuracy to evaluate the full interaction trace instead of just the final response.

evaluate_full_pipeline / evaluate_full_pipeline_async parameters

Parameter Type Required Description
prompt str The user prompt to evaluate
llm_callable Callable[[str], str] (sync) or Callable[[str], Awaitable[str]] (async) Callable that receives the (possibly sanitised) effective prompt and returns the LLM response. For the async variant this must be an async coroutine

EvaluationResult fields

Field Type Description
blocked bool True if any guard blocked the text
blocked_message str | None The block message configured on the guard
replaced bool True if a replace-action guard fired
replacement str | None The sanitised replacement text (PII-scrubbed prompt, etc.)
metrics dict[str, Any] Guard scores keyed by guard name (e.g. {"Toxicity": 0.87})

PipelineResult fields

Field Type Description
prompt_evaluation EvaluationResult Prescore evaluation result
response str | None Final (possibly replaced) LLM response; None when blocked
response_evaluation EvaluationResult | None Postscore evaluation result; None when prompt was blocked
blocked (computed) bool True if either stage was blocked
replaced (computed) bool True if either stage was replaced

What prescore_df contains

prescore_df is the raw pandas DataFrame produced by running all prescore (prompt-stage) guards on the input.
It starts as a copy of the input and gains one set of columns per guard after execution.

Column Description
{prompt_column_name} Original prompt text
{guard.metric_column_name} Guard score (one column per guard, e.g. Toxicity_toxicity_toxic_PREDICTION)
{guard_name}_latency Wall-clock seconds this guard took
blocked_{prompt_col} True if any guard blocked the prompt
blocked_message_{prompt_col} Block reason / message returned to the caller
replaced_{prompt_col} True if a replace-action guard fired
replaced_message_{prompt_col} Replacement text (sanitised prompt from PII guard, etc.)
reported_{prompt_col} True when a report-action guard fired
Noneed_{prompt_col} Internal sentinel for no-action guards
action_{prompt_col} Comma-joined string of actions taken (e.g. "block", "report,block")
(per-guard enforced column) Internal per-guard enforcement flag used by format_result_df

What postscore_df contains

postscore_df is the raw pandas DataFrame produced by running all postscore (response-stage) guards on the LLM output.
It starts with the predictions DataFrame (which includes the LLM response plus any pass-through columns) and gains guard result columns after execution.

Column Description
{response_column_name} LLM's response text
{prompt_column_name} User prompt (forwarded for faithfulness / task-adherence calculation)
CITATION_CONTENT_{N} Retrieved RAG context chunks (when citations are enabled)
PROMPT_TOKEN_COUNT_from_usage Prompt token count (when usage is provided by the LLM)
RESPONSE_TOKEN_COUNT_from_usage Response token count (when usage is provided by the LLM)
agentic_pipeline_interactions Agentic workflow interaction trace (for agent_goal_accuracy / task_adherence)
{association_id_column_name} Association ID (if the deployment has one configured)
{guard.metric_column_name} Guard score (one column per postscore guard, e.g. Response_Faithfulness_score)
{guard_name}_latency Wall-clock seconds this guard took
blocked_{response_col} True if any guard blocked the response
blocked_message_{response_col} Block message returned to the caller
replaced_{response_col} True if a replace-action guard fired on the response
replaced_message_{response_col} Replacement text
reported_{response_col} True when a report-action guard fired
Noneed_{response_col} Internal sentinel for no-action guards
action_{response_col} Comma-joined string of actions taken
(per-guard enforced column) Internal per-guard enforcement flag

Note: prescore_df and postscore_df are the raw executor outputs.
In the DRUM pipeline, format_result_df merges them into a single result_df that also adds unmoderated_{response_col}, moderated_{prompt_col}, datarobot_latency, datarobot_token_count, and datarobot_confidence_score. Those derived columns are not present in the DataFrames returned directly by evaluate_prompt / evaluate_response / evaluate_full_pipeline.


import os
from datarobot_dome.api import ModerationPipeline

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# TARGET_NAME is optional — sets the response column name used by postscore guards.
# Resolution order: TARGET_NAME env var → response_column_name in config → default "completion".
# os.environ["TARGET_NAME"] = "resultText"

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

# ── Prompt evaluation (prescore guards) ───────────────────────────────────────
# sync
result, latency, prescore_df = pipeline.evaluate_prompt("What is DataRobot?")
# async (inside an async function / FastAPI route / agent)
result, latency, prescore_df = await pipeline.evaluate_prompt_async("What is DataRobot?")

if result.blocked:
    print(f"Blocked: {result.blocked_message}")
elif result.replaced:
    print(f"Prompt sanitised to: {result.replacement}")

# ── Response evaluation (postscore guards) ────────────────────────────────────
# sync
result, latency, postscore_df = pipeline.evaluate_response(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",   # required for faithfulness / task-adherence guards
)
# async
result, latency, postscore_df = await pipeline.evaluate_response_async(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",
)
print(f"Latency: {latency:.3f}s  Blocked: {result.blocked}  Metrics: {result.metrics}")

# ── Full pipeline: prescore → LLM → postscore ─────────────────────────────────
# sync
def my_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."   # replace with your LLM call

result, prescore_df, postscore_df = pipeline.evaluate_full_pipeline("What is DataRobot?", my_llm)

# async (llm_callable must be an async coroutine)
async def my_async_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."   # replace with your async LLM call

result, prescore_df, postscore_df = await pipeline.evaluate_full_pipeline_async(
    "What is DataRobot?", my_async_llm
)

if result.blocked:
    stage = "prompt" if result.prompt_evaluation.blocked else "response"
    blocked_eval = (
        result.prompt_evaluation if result.prompt_evaluation.blocked
        else result.response_evaluation
    )
    print(f"Blocked at {stage}: {blocked_eval.blocked_message}")
elif result.replaced:
    print(f"Text replaced. Response: {result.response}")
else:
    print(f"Response: {result.response}")
    print(f"Metrics: {result.response_evaluation.metrics}")

Agentic workflow example

For agents, the library can evaluate the full interaction trace — every tool call, intermediate message, and final response — not just the last reply. This gives the agent_goal_accuracy guard accurate context to judge whether the agent actually achieved the user's goal.

The interaction trace (pipeline_interactions) is a JSON-serialised ragas.MultiTurnSample produced by the DataRobot agent after each task run. Pass it directly to evaluate_response.

Config (docs/examples/agent_goal_accuracy_config.yaml):

targets:
  - target: _default
    guards:
      - name: Agent Goal Accuracy
        type: ootb
        ootb_type: agent_goal_accuracy
        stage: response
        is_agentic: true
        llm_type: llmGateway
        llm_gateway_model_id: "azure/gpt-4o-mini"
        intervention:
          action: report  # measure-only: block/replace are ignored by the library
          conditions: []

Measure-only guard: agent_goal_accuracy (like cost and guideline_adherence) always forces intervene=False internally regardless of the action configured. The score is only available in result.metrics["agent_goal_accuracy"] — use it to make blocking decisions in your own code when needed.

Python — with full interaction trace (recommended for agentic pipelines):

import json
from datarobot_dome.api import ModerationPipeline

pipeline = ModerationPipeline.from_yaml("docs/examples/agent_goal_accuracy_config.yaml")

task = "Book a flight from NYC to London"

# chat_completion is the object returned by the DataRobot agent SDK.
# `pipeline_interactions` is attached when the agent has tool calls / multi-turn
# history; it is None for a plain single-turn response.
chat_completion = my_agent.run(task=task)
agent_response = chat_completion.choices[0].message.content
interactions_json = getattr(chat_completion, "pipeline_interactions", None)

result, latency, postscore_df = pipeline.evaluate_response(
    response=agent_response,
    prompt=task,
    pipeline_interactions=interactions_json,  # JSON str, or None
)

score = result.metrics.get("agent_goal_accuracy")
passed = score is not None and score >= 0.5
print(f"score={score}  passed={passed}")

**Python — building the interaction trace manually** (when not using the DataRobot agent SDK):

```python
import json
from ragas import MultiTurnSample
from ragas.messages import AIMessage, HumanMessage, ToolCall, ToolMessage

# Reconstruct the trace from your agent's execution log.
sample = MultiTurnSample(
    user_input=[
        HumanMessage(content="Book a flight from NYC to London"),
        AIMessage(
            content="Searching for available flights…",
            tool_calls=[ToolCall(name="search_flights", args={"origin": "NYC", "dest": "LON"})],
        ),
        ToolMessage(content='[{"flight": "BA178", "price": 620}]'),
        AIMessage(content="I found BA178 departing tomorrow for $620. Shall I book it?"),
    ]
)
interactions_json = json.dumps(sample.to_dict())

result, latency, _ = pipeline.evaluate_response(
    response="I found BA178 departing tomorrow for $620. Shall I book it?",
    prompt="Book a flight from NYC to London",
    pipeline_interactions=interactions_json,
)
print(result.blocked, result.metrics)

Without pipeline_interactions the guard falls back gracefully to evaluating the single prompt/response pair — useful during development before you have a live agent.


8b. From a plain Python dict

Use ModerationPipeline.from_dict when your configuration is already in dict form (e.g. loaded from JSON, fetched from an API, or assembled programmatically). The dict must follow the same schema as the YAML file.

Parameters

Parameter Type Required Description
config dict Guard configuration dictionary following the YAML schema
model_dir str | None Base directory used to resolve relative asset paths (e.g. NeMo guardrails .co flow files). Defaults to os.getcwd()
import os
from datarobot_dome.api import ModerationPipeline

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see §10 for resolution order

pipeline = ModerationPipeline.from_dict(
    {
        "targets": [
            {
                "target": "_default",
                "guards": [
                    {
                        "name": "Token Count",
                        "type": "ootb",
                        "ootb_type": "token_count",
                        "stage": "prompt",
                    }
                ],
            }
        ]
    },
    model_dir="/path/to/nemo_guardrails_dir",  # optional; only needed for NeMo guards
)

result, latency, prescore_df = pipeline.evaluate_prompt("Hello")
print(result.metrics)

8c. From a Pydantic config object

Use ModerationPipeline.from_config to build the configuration entirely in Python — no YAML file required. This is useful for dynamic configurations, programmatic guard registration, or when embedding moderation in a larger application.

Parameters

Parameter Type Required Description
config ModerationConfig A fully-constructed ModerationConfig Pydantic object
model_dir str | None Base directory used to resolve relative asset paths (e.g. NeMo guardrails .co flow files). Defaults to os.getcwd()

All schema types are importable from datarobot_dome.schema:

from datarobot_dome.schema import (
    ModerationConfig,
    TargetBlock,
    # Guard subtypes — pick the matching one per guard
    OOTBGuardSchema,
    ModelGuardSchema,
    NemoGuardrailsSchema,
    NemoEvaluatorSchema,
    # Nested schemas used inside guards
    AdditionalGuardConfigSchema,
    InterventionSchema,
    InterventionConditionSchema,
    ModelInfoSchema,
)

Schema type → guard type mapping

Guard YAML type Pydantic class
ootb OOTBGuardSchema
model ModelGuardSchema
nemo_guardrails NemoGuardrailsSchema
nemo_evaluator NemoEvaluatorSchema

LLM Gateway example — hate speech / guideline adherence

import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    AdditionalGuardConfigSchema,
    InterventionSchema,
    ModerationConfig,
    OOTBGuardSchema,
    TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "https://app.datarobot.com/api/v2"
os.environ["DATAROBOT_API_TOKEN"] = "<your-dr-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see §10 for resolution order

config = ModerationConfig(
    targets=[
        TargetBlock(
            target="_default",
            guards=[
                OOTBGuardSchema(
                    type="ootb",
                    name="Hate Speech",
                    stage="response",
                    ootb_type="agent_guideline_adherence",
                    llm_type="llmGateway",
                    llm_gateway_model_id="azure/gpt-4o-2024-11-20",
                    additional_guard_config=AdditionalGuardConfigSchema(
                        agent_guideline=(
                            "The response must not contain hate speech, slurs, or content "
                            "that demeans people based on race, religion, gender, nationality, "
                            "or any other protected characteristic."
                        )
                    ),
                    intervention=InterventionSchema(
                        action="report",
                        conditions=[],
                    ),
                )
            ],
        )
    ]
)

# Pass model_dir when your config references NeMo guardrails flow files:
# pipeline = ModerationPipeline.from_config(config, model_dir="/path/to/nemo_guardrails_dir")

text = "People from that group are living in France."
result, latency, postscore_df = pipeline.evaluate_response(response=text, prompt="Describe this text.")
score = result.metrics.get("agent_guideline_adherence_score")
print(f"score={score}  latency={latency:.3f}s")

Model guard example

import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    InterventionConditionSchema,
    InterventionSchema,
    ModerationConfig,
    ModelGuardSchema,
    ModelInfoSchema,
    TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see §10 for resolution order

config = ModerationConfig(
    targets=[
        TargetBlock(
            target="_default",
            guards=[
                ModelGuardSchema(
                    type="model",
                    name="Toxicity",
                    stage="prompt",
                    deployment_id="<your-toxicity-deployment-id>",
                    model_info=ModelInfoSchema(
                        input_column_name="text",
                        target_name="toxicity_toxic_PREDICTION",
                        target_type="Binary",
                        class_names=[],
                    ),
                    intervention=InterventionSchema(
                        action="block",
                        message="Toxic content blocked.",
                        conditions=[
                            InterventionConditionSchema(comparand=0.5, comparator="greaterThan")
                        ],
                    ),
                )
            ],
        )
    ]
)

pipeline = ModerationPipeline.from_config(config)

8d. Streaming pipeline

evaluate_full_pipeline_stream_async is the primary high-level API for streaming. It encapsulates prescore evaluation, the thread/queue bridge to ModerationIterator, and postscore guard execution — callers supply only a prompt and a streaming LLM callable.

Method signatures

Method When to use
evaluate_full_pipeline_stream_async(prompt, llm_callable) Preferred. Hides all internal state — no prescore_df required.
stream_response_async(completion, *, prompt, prescore_df, prescore_latency) Advanced: when you need to inspect the EvaluationResult from prescore before starting the LLM stream (e.g. to act on a REPLACE result).

evaluate_full_pipeline_stream_async parameters

Parameter Type Required Description
prompt str The user prompt
llm_callable Callable[[str], AsyncIterator[ChatCompletionChunk]] Sync callable that receives the (possibly sanitised) effective prompt and returns an async iterator of chunks. Called only when the prompt is not blocked.

Chunk signals

finish_reason Meaning
None or "stop" Normal chunk — content is in chunk.choices[0].delta.content
"content_filter" A guard intervened. delta.content holds the block message. The LLM was never called if this is the first (and only) chunk.

Example

import asyncio
import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    InterventionSchema, ModerationConfig, OOTBGuardSchema, TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"

pipeline = ModerationPipeline.from_config(
    ModerationConfig(
        targets=[
            TargetBlock(
                target="_default",
                guards=[
                    OOTBGuardSchema(
                        name="Prompt Token Limit",
                        type="ootb",
                        ootb_type="token_count",
                        stage="prompt",
                        intervention=InterventionSchema(
                            action="block",
                            conditions=[{"comparator": "greaterThan", "comparand": 200}],
                            message="Prompt too long.",
                        ),
                    ),
                ],
            )
        ]
    )
)


async def my_llm_stream(prompt: str):
    """Wrap a sync OpenAI stream as an async iterator."""
    import openai
    client = openai.OpenAI(
        api_key=os.environ["DATAROBOT_API_TOKEN"],
        base_url=f"{os.environ['DATAROBOT_ENDPOINT']}/genai/llmgw",
    )
    for chunk in client.chat.completions.create(
        model="azure/gpt-4o-2024-11-20",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    ):
        yield chunk


async def run(prompt: str) -> None:
    print(f"Prompt: {prompt!r}")
    async for chunk in pipeline.evaluate_full_pipeline_stream_async(prompt, my_llm_stream):
        finish_reason = chunk.choices[0].finish_reason
        content = chunk.choices[0].delta.content
        if finish_reason == "content_filter":
            print(f"[BLOCKED] {content}")
            return
        if content:
            print(content, end="", flush=True)
    print()


asyncio.run(run("What is DataRobot?"))

Advanced: stream_response_async

Use when you need the prescore EvaluationResult before streaming begins:

result, latency, prescore_df = await pipeline.evaluate_prompt_async(prompt)
if result.blocked:
    # handle block before ever calling the LLM
    return result.blocked_message

effective = result.replacement if result.replaced else prompt

async for chunk in pipeline.stream_response_async(
    my_llm_stream(effective),
    prompt=effective,
    prescore_df=prescore_df,      # must come from evaluate_prompt_async
    prescore_latency=latency,
):
    ...

With DRUM

Place moderation_config.yaml alongside your custom model code, then:

drum score --verbose \
  --code-dir ./ \
  --input ./input.csv \
  --target-type textgeneration \
  --runtime-params-file values.yaml

9. Testing guide

Set these environment variables before running any test (see §10 for details):

export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-token"
export TARGET_NAME="resultText"

Guards fall into four groups based on the credentials they require:

Group Guard types Extra credentials needed
A — local token_count, rouge_1, cost, custom_metric (none beyond the base vars above)
B — DataRobot deployment type: model, any ootb with llm_type: datarobot or llm_type: llmGateway Only DATAROBOT_API_TOKEN; provide a real deployment_id
C — external LLM provider Any ootb with llm_type: openAi, azureOpenAi, google, amazon, nim Provider-specific env var (see §10)
D — NeMo type: nemo_guardrails, type: nemo_evaluator Provider key for NeMo Guardrails; DATAROBOT_API_TOKEN for NeMo Evaluator

See §5 for complete YAML examples per guard type and §8 for Python usage patterns.


10. Environment variables

Always required

Variable Description
DATAROBOT_ENDPOINT DataRobot instance URL, e.g. https://app.datarobot.com/api/v2
DATAROBOT_API_TOKEN DataRobot API token
TARGET_NAME The name of the DataFrame column that holds the LLM response text (e.g. resultText). Resolution order for the response column (highest to lowest priority): (1) DRUM deployment target_name (always wins when MLOPS_DEPLOYMENT_ID is set), (2) TARGET_NAME env var, (3) response_column_name in the config file, (4) built-in default "completion". DRUM sets this automatically; in standalone Python you can set it here or declare response_column_name in the YAML/ModerationConfig — but the env var takes precedence if both are provided.
DISABLE_MODERATION Set to true to disable all guards at runtime.

deepeval telemetry

The task_adherence guard uses deepeval internally. By default, moderations opts out of deepeval's usage telemetry — no .deepeval/ directory is created and no data is sent externally.

To opt in, set enable_deepeval_telemetry: true in your config (only takes effect when a task_adherence guard is present; deepeval is loaded lazily):

enable_deepeval_telemetry: true   # default: false

guards:
  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response

To opt out explicitly via environment variable (e.g. in CI or container environments):

export DEEPEVAL_TELEMETRY_OPT_OUT=YES  # opt out (library default)
unset DEEPEVAL_TELEMETRY_OPT_OUT       # opt in

Credentials for LLM-eval guards using external providers

When your guard uses llm_type: datarobot, it reuses DATAROBOT_API_TOKEN — no extra variable needed.

For external providers (OpenAI, Azure OpenAI, Google, AWS), set a guard-specific env var. The variable name is built from the guard's type, stage, and ootb_type:

MLOPS_RUNTIME_PARAM_MODERATION_{TYPE}_{STAGE}_{OOTB_TYPE}_{PROVIDER_SUFFIX}
Guard (ootb_type) Provider Environment variable
task_adherence OpenAI MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_OPENAI_API_KEY
task_adherence Azure OpenAI MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_AZURE_OPENAI_API_KEY
faithfulness OpenAI MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_OPENAI_API_KEY
faithfulness Azure OpenAI MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_AZURE_OPENAI_API_KEY
agent_guideline_adherence Azure OpenAI MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_AZURE_OPENAI_API_KEY
agent_guideline_adherence Google Vertex AI MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_GOOGLE_SERVICE_ACCOUNT
agent_goal_accuracy Azure OpenAI MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AZURE_OPENAI_API_KEY
agent_goal_accuracy AWS Bedrock MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AWS_ACCOUNT
nemo_guardrails (prompt) Azure OpenAI MLOPS_RUNTIME_PARAM_MODERATION_NEMO_GUARDRAILS_PROMPT_AZURE_OPENAI_API_KEY

Value format per provider:

# OpenAI / Azure OpenAI
'{"type":"credential","payload":{"credentialType":"api_token","apiToken":"YOUR_KEY"}}'

# Google Vertex AI
'{"type":"credential","payload":{"credentialType":"gcp","gcpKey":{...}}}'

# AWS Bedrock
'{"type":"credential","payload":{"credentialType":"s3","awsAccessKeyId":"...","awsSecretAccessKey":"...","awsSessionToken":"..."}}'

DataRobot Moderation CLI

The dr-moderation CLI lets you manage guards and test moderation pipelines from the terminal — no Python code required.


Table of Contents

  1. Installation
  2. Authentication
  3. Commands
  4. YAML schema quick reference
  5. Exit codes

1. Installation

End-user — the dr-moderation binary lands on your PATH automatically:

pip install datarobot-moderations
dr-moderation --help

Developer / contributor — Poetry places the binary inside .venv/bin/, which is not on your PATH until the venv is active. Pick one:

poetry shell                        # Option A: activate for the session
poetry run dr-moderation --help     # Option B: one-off prefix
make cli ARGS="evaluate --help"     # Option C: Makefile shortcut

Python 3.10 – 3.12 required.


2. Authentication

Commands that call the DataRobot API need credentials. Set them once per session:

export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-api-token"

Or pass them as global flags (flags take precedence over env vars):

dr-moderation --endpoint <url> --token <token> <command>

3. Commands

3.1 evaluate

Evaluate a prompt and/or response through the local ModerationPipeline. Supports every guard type including LLM Gateway (llm_type: llmGateway) — no deployment required.

The config file must use the Python SDK snake_case schema (see GUARDRAILS.md for the full field reference).

dr-moderation evaluate [OPTIONS]
Option Required Default Description
--config-file FILE Moderation config YAML (snake_case SDK format)
--prompt TEXT ❌ * Prompt text; evaluated against prescore guards
--response TEXT ❌ * Response text; evaluated against postscore guards. Also pass --prompt for guards that need both (e.g. faithfulness, task_adherence)
--as-json false Emit results as JSON — useful for scripting

* At least one of --prompt or --response is required.

Example output (human-readable):

── Prescore (prompt) ──────────────────────────────
  Blocked  : False
  Metrics  :
    Prompts_token_count: 4
  Latency  : 0.05s

Examples:

# Token-count guard on a prompt
dr-moderation evaluate \
  --config-file docs/examples/token_count_config.yaml \
  --prompt "Hello, world!"

# LLM Gateway task-adherence guard
dr-moderation evaluate \
  --config-file docs/examples/llm_gateway_config.yaml \
  --prompt "What is DataRobot?" \
  --response "DataRobot is an AI platform."

# Evaluate both, emit JSON, pipe to jq
dr-moderation evaluate \
  --config-file docs/examples/llm_gateway_config.yaml \
  --prompt "What is DataRobot?" \
  --response "DataRobot is an AI platform." \
  --as-json | jq '.postscore.metrics'

Ready-made configs in docs/examples/:

  • token_count_config.yaml — prompt + response token-count guards
  • llm_gateway_config.yaml — token-count prompt guard + LLM Gateway task_adherence

3.2 add-guard

Add guards to an existing DataRobot custom model. Creates a new custom model version with the guards attached and prints the version ID to stdout.

How it works:

  1. You create and register a custom model (your LLM) in DataRobot — this gives you a customModelId.
  2. You define guards in a camelCase YAML file.
  3. add-guard POSTs the config to /guardConfigurations/toNewCustomModelVersion/. DataRobot creates a new version of the model with the guards and returns the customModelVersionId.
  4. Deploy that new version — it will now enforce your guards on every prompt/response.
dr-moderation add-guard [OPTIONS]
Option Required Default Description
--custom-model-id TEXT ID of the custom model (find it in the DataRobot UI under Model Workshop → Custom Models)
--config-file FILE YAML list of guard configurations (camelCase API format)
--timeout-sec INTEGER 60 Per-guard timeout in seconds
--timeout-action [score|block] score Action on timeout: score passes through; block rejects

Example output:

6797abc123def456789abcde

The printed ID is the new customModelVersionId — pass it to subsequent API or SDK calls to deploy the version.

Examples:

# Add guards, capture the new version ID
VERSION_ID=$(dr-moderation add-guard \
  --custom-model-id 6793e6b2114f17240fa2194c \
  --config-file docs/examples/add_guard_config.yaml)
echo "New version: ${VERSION_ID}"

# Block if any guard exceeds 30 s
dr-moderation add-guard \
  --custom-model-id 6793e6b2114f17240fa2194c \
  --config-file docs/examples/add_guard_config.yaml \
  --timeout-sec 30 \
  --timeout-action block

3.3 agent a2a connect

Verify connectivity to a remote A2A agent by fetching its agent card from /.well-known/agent.json.

dr-moderation agent a2a connect [OPTIONS]
Option Required Description
--url TEXT Base URL of the remote A2A agent
--deployment-id TEXT DataRobot deployment ID to verify alongside the agent

Examples:

# 1. Start a one-line A2A mock (serves /.well-known/agent.json on port 8765)
python3 - << 'EOF'
import json
from http.server import BaseHTTPRequestHandler, HTTPServer

CARD = {"name": "My Agent", "version": "1.0.0", "capabilities": ["moderation"]}

class H(BaseHTTPRequestHandler):
    def do_GET(self):
        body = json.dumps(CARD).encode()
        self.send_response(200)
        self.send_header("Content-Type", "application/json")
        self.end_headers()
        self.wfile.write(body)
    def log_message(self, *_): pass

HTTPServer(("localhost", 8765), H).serve_forever()
EOF &

# 2. Connect to it
dr-moderation agent a2a connect --url http://localhost:8765

Production examples:

# Verify a remote A2A agent is reachable
dr-moderation agent a2a connect --url https://my-agent.example.com

# Also verify the backing DataRobot deployment
dr-moderation agent a2a connect \
  --url https://my-agent.example.com \
  --deployment-id 6793e6b2114f17240fa2194c

4. YAML schema quick reference

The two commands use different schemas — they are not interchangeable:

Command Format Key fields
add-guard DataRobot API — camelCase ootbType, stages (list), intervention
evaluate Python SDK — snake_case ootb_type, stage (string or list), llm_type, llm_gateway_model_id

add-guard config (camelCase)

Sent directly to /guardConfigurations/toNewCustomModelVersion/. The file must be a YAML list.

- name: Prompt Token Count
  type: ootb
  ootbType: token_count
  stages: [prompt]
  intervention:
    action: report
    allowedActions: [report, block]
    message: " "
    sendNotification: false
    conditions: []
Field Required Notes
name Unique per config
type ootb · guardModel · userModel · nemo
stages List: [prompt], [response], or [prompt, response]
ootbType When type: ootb token_count, faithfulness, rouge_1, etc.
modelInfo When type: guardModel inputColumnName, outputColumnName, targetType, classNames
intervention action, conditions, message; omit to measure only

evaluate config (snake_case)

Consumed by ModerationPipeline.from_yaml. For the full field reference see GUARDRAILS.md.

The key difference from add-guard: use llm_type: llmGateway with llm_gateway_model_idno deployment_id needed:

- name: Task Adherence
  type: ootb
  ootb_type: task_adherence
  stage: response
  llm_type: llmGateway
  llm_gateway_model_id: "azure/gpt-4o-2024-11-20"
  intervention:
    action: block
    message: "Response does not address the task."
    conditions:
      - comparator: lessThan
        comparand: 0.5

5. Exit codes

Code Meaning
0 Success
1 Runtime error (API error, bad YAML, connection refused)
2 Invalid CLI usage (missing required option, unknown value)

Non-zero exits write a descriptive message to stderr.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datarobot_moderations-11.2.31-py3-none-any.whl (136.0 kB view details)

Uploaded Python 3

File details

Details for the file datarobot_moderations-11.2.31-py3-none-any.whl.

File metadata

  • Download URL: datarobot_moderations-11.2.31-py3-none-any.whl
  • Upload date:
  • Size: 136.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.11.15 Linux/6.1.159-181.297.amzn2023.x86_64

File hashes

Hashes for datarobot_moderations-11.2.31-py3-none-any.whl
Algorithm Hash digest
SHA256 5b1b0f9ee751b2c6d3396ff5bd8ac546d5d60b0b64e9734f21600bc4350d1cb2
MD5 93a74237e46f5b4af582004834f6de01
BLAKE2b-256 36681db59f9480fcd7fb5b1ac472331761f05626b1c925799cbda870d4d11bcc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page