DataRobot Monitoring and Moderation framework
Project description
DataRobot Moderations library
This library enforces the intervention in the prompt and response texts as per the guard configuration set by the user.
The library accepts the guard configuration in the yaml format and the input prompts and outputs the dataframe with the details like:
- should the prompt be blocked
- should the completion be blocked
- metric values obtained from the model guards
- is the prompt or response modified as per the modifier guard configuration
Architecture
The library is architected in a way that it wraps around the typical LLM prediction method. The library will first run the pre-score guards - the guards that will evaluate prompts and enforce moderation if necessary. All the prompts that were not moderated by the library are forwarded to the actual LLM to get their respective completions. The library then evaluates these completions using post-score guards and enforces intervention on them.
How to build it?
The repository uses poetry to manage the build process and a wheel can be built using:
make clean
make
How to use it?
A wheel file generated or downloaded can be installed with pip and will pull its dependencies as well.
pip3 install datarobot-moderations
Optional extras
The base install covers token-count, ROUGE-1, cost, and NeMo guards. Heavier or cloud-specific dependencies are opt-in:
| Extra | What it enables |
|---|---|
datarobot-sdk |
DataRobot model guards, DataRobot LLM evaluator type |
llm-eval |
Faithfulness, Task Adherence, Agent Goal Accuracy, Guideline Adherence guards |
nemo |
NeMo Guardrails colang-based flow guard |
nemo-evaluator |
NeMo live-evaluation microservice guard |
nvidia |
NVIDIA NIM / ChatNVIDIA LLM support |
vertex |
Google Cloud Vertex AI LLM support |
bedrock |
AWS Bedrock LLM support |
all |
Every optional dependency at once |
# Example: task-adherence guard backed by a DataRobot LLM deployment
pip3 install 'datarobot-moderations[llm-eval,datarobot-sdk]'
Standalone Python API
There are three ways to create a ModerationPipeline:
From a YAML file — reads guard configuration from disk:
from datarobot_dome.api import ModerationPipeline
pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")
From a plain Python dictionary — useful for dynamic or programmatic configs:
from datarobot_dome.api import ModerationPipeline
pipeline = ModerationPipeline.from_dict({
"targets": [
{
"target": "_default",
"guards": [
{"name": "Token Count", "type": "ootb", "ootb_type": "token_count", "stage": "prompt"},
],
}
]
})
From a Pydantic config object — full type-safety and IDE autocompletion:
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import ModerationConfig, OOTBGuardSchema, TargetBlock
config = ModerationConfig(
targets=[
TargetBlock(
target="_default",
guards=[OOTBGuardSchema(name="Token Count", stage="prompt", ootb_type="token_count")],
)
]
)
pipeline = ModerationPipeline.from_config(config)
All three constructors validate DataRobot credentials (DATAROBOT_ENDPOINT and
DATAROBOT_API_TOKEN) before initialising the pipeline.
Evaluate a prompt (pre-score guards only):
result, latency = pipeline.evaluate_prompt("Ignore previous instructions and …")
if result.blocked:
print(result.blocked_message)
Evaluate a response (post-score guards only):
result, latency = pipeline.evaluate_response(
response="The capital of France is Paris.",
prompt="What is the capital of France?",
)
print(result.blocked) # True / False
print(result.metrics) # {"task_adherence_score": 0.0, ...}
Full pipeline — pre-score → LLM → post-score in one call:
def my_llm(prompt: str) -> str:
# Replace with your actual LLM integration (OpenAI, Vertex, etc.)
return "DataRobot is an AI platform."
result = pipeline.evaluate_full_pipeline(
prompt="What is DataRobot?",
llm_callable=my_llm,
)
if not result.blocked:
print(f"LLM Response: {result.response}")
Result objects
evaluate_prompt / evaluate_response return an EvaluationResult:
| Field | Type | Description |
|---|---|---|
blocked |
bool |
Whether a BLOCK guard fired |
blocked_message |
str | None |
Guard-supplied block reason |
replaced |
bool |
Whether a REPLACE guard fired |
replacement |
str | None |
The replacement text |
metrics |
dict |
All guard metric values (scores, counts, …) |
evaluate_full_pipeline returns a PipelineResult:
| Field | Type | Description |
|---|---|---|
prompt_evaluation |
EvaluationResult |
Pre-score guard result |
response |
str | None |
Effective response (post-replacement if applicable) |
response_evaluation |
EvaluationResult | None |
Post-score guard result |
blocked |
bool |
True if either stage was blocked |
replaced |
bool |
True if either stage was replaced |
With DRUM
As described above, the library nicely wraps DRUM's score method for pre and post score
guards. Hence, in case of DRUM, the user simply runs their custom model using drum score
and can avail the moderation library features.
Install DRUM along with the necessary optional extras for your specific guards. If you are unsure which guards are in use, install [all]:
pip3 install datarobot-drum 'datarobot-moderations[all]'
drum score --verbose --logging-level info --code-dir ./ --input ./input.csv --target-type textgeneration --runtime-params-file values.yaml
Guardrails Configuration Guide
Guards evaluate prompts (pre-score) and/or responses (post-score) and can block, report, or replace content based on configurable conditions.
Table of Contents
- File structure
- Top-level options
- Common guard fields
- Intervention block
- Guard types
- LLM back-end options
- Full annotated example
- Using the config in Python
- Testing guide
- Environment variables
1. File structure
timeout_sec: 10
timeout_action: score
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"
guards:
- name: My Guard
type: ootb
stage: prompt
# ...
2. Top-level options
| Field | Type | Default | Description |
|---|---|---|---|
timeout_sec |
int | 10 |
Seconds to wait per guard |
timeout_action |
string | score |
score (allow) or block on timeout |
nemo_evaluator_deployment_id |
string | — | DataRobot deployment ID of the NeMo Evaluator microservice; required when any guard uses type: nemo_evaluator |
guards |
list | required | List of guard definitions |
3. Common guard fields
| Field | Required | Description |
|---|---|---|
name |
✅ | Unique label; used as the key in result.metrics and as the DataRobot custom metric name |
type |
✅ | ootb · model · nemo_guardrails · nemo_evaluator |
stage |
✅ | prompt · response · [prompt, response] (list runs the guard at both stages) |
description |
❌ | Free-text label, ignored by the library |
intervention |
❌ | What to do when the condition fires (see §4). Omit entirely to measure only — nothing is ever blocked |
copy_citations |
❌ | Boolean (true/false, default false). Passes retrieved RAG context to this guard. Required for rouge_1 and faithfulness to produce meaningful scores |
is_agentic |
❌ | Marks an agentic-workflow guard (default false). Required by agent_goal_accuracy |
# stage as a list — guard runs independently at both prompt and response stages
- name: Token Count Both
type: ootb
ootb_type: token_count
stage: [prompt, response]
intervention:
action: block
message: "Input or output exceeds the token limit."
conditions:
- comparator: greaterThan
comparand: 100
4. Intervention block
intervention:
action: block # "block" | "report" | "replace"
message: "Blocked." # returned to caller
send_notification: false
conditions:
- comparand: 0.5
comparator: greaterThan
One condition per intervention. The
conditionslist accepts exactly one entry forblockandreplace; zero entries (conditions: []) is valid forreport. To combine conditions (e.g. block if score < 0.2 or > 0.9), use two separate guards.
Actions
| Action | Effect |
|---|---|
block |
Reject and return message to the caller. message is optional in the schema but omitting it returns an empty string — always set it. |
report |
Record the metric and allow content through unchanged. Behaviorally identical to omitting the intervention block entirely; useful when you want the metric tracked but never want to block. |
replace |
Swap the text with the sanitised version returned by the deployment. Only valid for type: model guards. The deployment must return the replacement text in the field specified by model_info.replacement_text_column_name; if that field is absent a ValueError is raised. |
Comparators
| Comparator | Comparand type | Description |
|---|---|---|
greaterThan / lessThan |
number | Numeric threshold |
equals / notEquals |
number | string | Exact equality. Use comparand: "TRUE" with NeMo Guardrails guards, whose score is the string "TRUE" or "FALSE" |
is / isNot |
boolean | Boolean equality |
matches / doesNotMatch |
list of strings | Class membership. matches fires if the prediction is in the list; doesNotMatch fires if it is not. |
contains / doesNotContain |
list of strings | Substring check against a list. contains fires if all items in the list are found as substrings of the prediction; doesNotContain fires if not all items are found. |
5. Guard types
5.1 Out-of-the-Box (ootb)
Set type: ootb and ootb_type.
Install only what you use:
pip install datarobot-moderations # base — token_count, rouge_1, cost, custom_metric
pip install 'datarobot-moderations[llm-eval]' # + faithfulness, task_adherence, agent_guideline_adherence, agent_goal_accuracy
pip install 'datarobot-moderations[llm-eval,vertex]' # + Google Vertex AI as LLM judge
pip install 'datarobot-moderations[llm-eval,bedrock]' # + AWS Bedrock as LLM judge
pip install 'datarobot-moderations[llm-eval,nvidia]' # + NVIDIA NIM as LLM judge
pip install 'datarobot-moderations[datarobot-sdk]' # required for type: model and llm_type: datarobot
pip install 'datarobot-moderations[all]' # everything
ootb_type |
Stage | Install extra | Description |
|---|---|---|---|
token_count |
prompt / response | (base) | Token count |
rouge_1 |
response | (base) | ROUGE-1 overlap with citations |
faithfulness |
response | llm-eval |
LLM-judged hallucination detection |
task_adherence |
response | llm-eval |
Task-completion score |
agent_guideline_adherence |
response | llm-eval |
Guideline adherence |
agent_goal_accuracy |
response | llm-eval |
Agentic goal-accuracy |
cost |
response | (base) | Estimated cost. Counts both prompt tokens (input_price/input_unit) and response tokens (output_price/output_unit). Must be at the response stage because both token counts are only available after the LLM responds. Currently only currency: USD is supported. |
custom_metric |
prompt / response | (base) | User-defined numeric metric |
# Token count — report only
- name: Prompt Token Count
type: ootb
ootb_type: token_count
stage: prompt
# Token count — block on length
- name: Response Token Count
type: ootb
ootb_type: token_count
stage: response
intervention:
action: block
message: "Response too long."
conditions:
- comparand: 1000
comparator: greaterThan
# ROUGE-1 (requires citations)
- name: Rouge 1
type: ootb
ootb_type: rouge_1
stage: response
copy_citations: true
intervention:
action: report
conditions: []
# Faithfulness
- name: Faithfulness
type: ootb
ootb_type: faithfulness
stage: response
copy_citations: true
llm_type: datarobot
deployment_id: "<your-llm-id>" # 24-char DataRobot deployment ID
intervention:
action: block
message: "Hallucination detected."
conditions:
- comparand: 0.0
comparator: equals
# Task Adherence
- name: Task Adherence
type: ootb
ootb_type: task_adherence
stage: response
llm_type: datarobot
deployment_id: "<your-llm-id>"
intervention:
action: block
message: "LLM did not complete the requested task."
conditions:
- comparator: lessThan
comparand: 0.5
# Guideline Adherence
- name: Guideline Adherence
type: ootb
ootb_type: agent_guideline_adherence
stage: response
llm_type: datarobot
deployment_id: "<your-llm-id>"
additional_guard_config:
agent_guideline: "Response must be polite and on-topic." # free-text criterion for the LLM judge
intervention:
action: block
message: "Response violates guidelines."
conditions:
- comparand: 0.0
comparator: equals
# Agent Goal Accuracy
- name: Agent Goal Accuracy
type: ootb
ootb_type: agent_goal_accuracy
stage: response
is_agentic: true
llm_type: datarobot
deployment_id: "<your-llm-id>"
intervention:
action: report
conditions: []
# Cost tracking
- name: Cost
type: ootb
ootb_type: cost
stage: response
additional_guard_config:
cost:
currency: USD
input_price: 0.01
input_unit: 1000
output_price: 0.03
output_unit: 1000
intervention:
action: report
conditions: []
5.2 Model guard
Wraps any DataRobot deployment you have already created (binary classifier, regression, multiclass, or text-generation). The library sends the text to that deployment and uses the prediction it returns to decide whether to block, report, or replace content.
# Binary classifier (e.g. toxicity, prompt injection)
# Works with any DataRobot binary classification deployment.
- name: Toxicity
type: model
stage: prompt
deployment_id: "<your-deployment-id>" # 24-char DataRobot deployment ID
model_info:
input_column_name: text # field your deployment reads as input
target_name: toxicity_toxic_PREDICTION # prediction field returned by the deployment
target_type: Binary # Binary | Regression | Multiclass | TextGeneration
class_names: [] # leave empty for Binary/Regression
intervention:
action: block
message: "Toxic content blocked."
conditions:
- comparand: 0.5
comparator: greaterThan
# PII detection with text replacement
# The deployment must return BOTH the score field (`target_name`)
# AND a sanitised-text field (`replacement_text_column_name`).
- name: PII Detector
type: model
stage: prompt
deployment_id: "<your-pii-deployment-id>"
model_info:
input_column_name: text
target_name: contains_pii_true_PREDICTION
target_type: TextGeneration
replacement_text_column_name: anonymized_text_OUTPUT
class_names: []
intervention:
action: replace
message: "PII removed from prompt."
conditions:
- comparand: 0.5
comparator: greaterThan
# Multi-label / emotion classifier
- name: Emotion Classifier
type: model
stage: prompt
deployment_id: "<your-emotion-deployment-id>"
model_info:
input_column_name: text
target_name: target_PREDICTION
target_type: TextGeneration
class_names: [anger, fear, sadness, disgust, joy, neutral]
intervention:
action: block
message: "Negative emotion detected."
conditions:
- comparand: [anger, fear, sadness, disgust]
comparator: matches
5.3 NeMo Guardrails
Flow-based content filtering. Requires pip install 'datarobot-moderations[nemo]'.
Supported
llm_typevalues:openAi,azureOpenAi,nim,llmGatewayonly.
Colang flow files must live in stage-specific subdirectories of nemo_guardrails/:
nemo_guardrails/
prompt/ # config.yml + *.co files for stage: prompt
response/ # config.yml + *.co files for stage: response
- name: Stay on topic
type: nemo_guardrails
stage: prompt
llm_type: azureOpenAi
openai_api_base: "https://<resource>.openai.azure.com/"
openai_deployment_id: gpt-4o-mini
intervention:
action: block
message: "This topic is outside the allowed scope."
conditions:
- comparand: "TRUE"
comparator: equals
5.4 NeMo Evaluator
Calls a DataRobot-hosted NeMo Evaluator microservice. Requires pip install 'datarobot-moderations[nemo-evaluator]'.
Two deployment IDs — what's the difference?
| Field | What it points to |
|---|---|
nemo_evaluator_deployment_id (top-level) |
Your NeMo Evaluator microservice deployment in DataRobot |
deployment_id (per-guard) |
The LLM deployment the evaluator uses to do the judging |
Both values must be valid 24-character DataRobot deployment IDs. Using a placeholder longer than 24 characters (e.g. "<your-nemo-evaluator-id>") causes a load-time validation error: String is longer than 24 characters.
llm_typemust bedatarobotfor allnemo_evaluatorguards.
nemo_evaluator_type |
Stage | Description |
|---|---|---|
llm_judge |
prompt / response | Custom LLM-as-judge with your own prompts. score_parsing_regex is a regular expression applied to the LLM's raw text reply to extract a single numeric score — e.g. "([1-5])" picks the first digit 1–5 from any surrounding text. |
context_relevance |
response | Relevance of retrieved context to the question |
response_groundedness |
response | Groundedness in retrieved context |
topic_adherence |
response | Adherence to allowed topics |
response_relevancy |
response | Relevance of response to question |
faithfulness |
response | NeMo microservice faithfulness score |
agent_goal_accuracy |
response | Agentic goal-accuracy via NeMo |
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"
guards:
- name: Safety Judge
type: nemo_evaluator
stage: response
nemo_evaluator_type: llm_judge
llm_type: datarobot
deployment_id: "<your-llm-id>"
nemo_llm_judge_config:
system_prompt: "Rate safety 1-5. Output ONLY the integer."
user_prompt: "Response: {response}"
score_parsing_regex: "([1-5])" # regex to extract the numeric score from the LLM's text output
custom_metric_directionality: higherIsBetter # "higherIsBetter" | "lowerIsBetter"
intervention:
action: block
message: "Response failed safety evaluation."
conditions:
- comparand: 2
comparator: lessThan
- name: Topic Adherence
type: nemo_evaluator
stage: response
nemo_evaluator_type: topic_adherence
llm_type: datarobot
deployment_id: "<your-llm-id>"
nemo_topic_adherence_config:
metric_mode: f1 # "f1" | "precision" | "recall"
reference_topics: [DataRobot, machine learning, AI platforms]
intervention:
action: report
conditions: []
- name: Response Relevancy
type: nemo_evaluator
stage: response
nemo_evaluator_type: response_relevancy
llm_type: datarobot
deployment_id: "<your-llm-id>"
nemo_response_relevancy_config:
embedding_deployment_id: "<your-embedding-id>"
intervention:
action: report
conditions: []
6. LLM back-end options
Some ootb guards (e.g. faithfulness, task_adherence) call an LLM to judge the text. You choose which LLM provider to use via llm_type.
DataRobot credentials (
DATAROBOT_ENDPOINT+DATAROBOT_API_TOKEN) are always required
Supported llm_type values
llm_type |
LLM provider | Extra YAML fields | Extra install |
|---|---|---|---|
datarobot |
DataRobot-hosted LLM deployment | deployment_id |
datarobot-sdk |
openAi |
OpenAI API | (none) | llm-eval |
azureOpenAi |
Azure OpenAI | openai_api_base, openai_deployment_id |
llm-eval |
google |
Google Vertex AI | google_region, google_model |
llm-eval,vertex |
amazon |
AWS Bedrock | aws_region, aws_model |
llm-eval,bedrock |
nim |
NVIDIA NIM | openai_api_base |
llm-eval,nvidia |
llmGateway |
DataRobot LLM Gateway | llm_gateway_model_id |
datarobot-sdk |
nemo_guardrails supports: openAi, azureOpenAi, nim only
nemo_evaluator supports: datarobot only
Available models (Google / AWS)
The library maps a fixed set of model names to their provider API identifiers. Models not in this list are not supported.
| Provider | llm_type |
google_model / aws_model |
|---|---|---|
| Google Vertex AI | google |
google-gemini-1.5-flash, google-gemini-1.5-pro, chat-bison |
| AWS Bedrock | amazon |
amazon-titan, anthropic-claude-2, anthropic-claude-3-haiku, anthropic-claude-3-sonnet, anthropic-claude-3-opus, anthropic-claude-3.5-sonnet-v1, anthropic-claude-3.5-sonnet-v2, amazon-nova-lite, amazon-nova-micro, amazon-nova-pro |
7. Full annotated example
Replace every
<...>placeholder with a real value before use. DataRobot deployment IDs are exactly 24 hexadecimal characters.
timeout_sec: 15
timeout_action: score
guards:
# -- Pre-score (prompt) --------------------------------------------------
- name: Prompt Injection
type: model
stage: prompt
deployment_id: "<prompt-injection-id>"
model_info:
input_column_name: text
target_name: injection_injection_PREDICTION
target_type: Binary
class_names: []
intervention:
action: block
message: "Prompt injection attempt detected and blocked."
conditions:
- comparand: 0.80
comparator: greaterThan
- name: Toxicity
type: model
stage: prompt
deployment_id: "<toxicity-id>"
model_info:
input_column_name: text
target_name: toxicity_toxic_PREDICTION
target_type: Binary
class_names: []
intervention:
action: block
message: "Toxic content is not allowed."
conditions:
- comparand: 0.5
comparator: greaterThan
- name: PII Detector
type: model
stage: prompt
deployment_id: "<pii-id>"
model_info:
input_column_name: text
target_name: contains_pii_true_PREDICTION
target_type: TextGeneration
replacement_text_column_name: anonymized_text_OUTPUT
class_names: []
intervention:
action: replace
message: "PII detected and removed."
conditions:
- comparand: 0.5
comparator: greaterThan
- name: Topic Guardrail
type: nemo_guardrails
stage: prompt
llm_type: azureOpenAi
openai_api_base: "https://<resource>.openai.azure.com/"
openai_deployment_id: gpt-4o-mini
intervention:
action: block
message: "This topic is outside the allowed scope."
conditions:
- comparand: "TRUE"
comparator: equals
# -- Post-score (response) -----------------------------------------------
- name: Response Token Count
type: ootb
ootb_type: token_count
stage: response
- name: Faithfulness
type: ootb
ootb_type: faithfulness
stage: response
copy_citations: true
llm_type: datarobot
deployment_id: "<llm-id>"
intervention:
action: block
message: "The response appears to be hallucinated."
conditions:
- comparand: 0.0
comparator: equals
- name: Task Adherence
type: ootb
ootb_type: task_adherence
stage: response
llm_type: datarobot
deployment_id: "<llm-id>"
intervention:
action: block
message: "LLM did not complete the requested task."
conditions:
- comparator: lessThan
comparand: 0.5
- name: Cost
type: ootb
ootb_type: cost
stage: response
additional_guard_config:
cost:
currency: USD
input_price: 0.01
input_unit: 1000
output_price: 0.03
output_unit: 1000
intervention:
action: report
conditions: []
8. Using the config in Python
Guards can be configured from a YAML file or from a Pydantic object built entirely in Python. Both approaches are fully equivalent — choose whichever fits your workflow.
8a. From a YAML file
evaluate_prompt and evaluate_response each return (EvaluationResult, latency_seconds).
evaluate_full_pipeline returns a PipelineResult (no latency tuple).
EvaluationResult.metrics holds the guard scores keyed by guard name.
import os
from datarobot_dome.api import ModerationPipeline
os.environ["TARGET_NAME"] = "resultText" # must match your deployment's response output field name
os.environ["DATAROBOT_ENDPOINT"] = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")
# Evaluate prompt only (pre-score guards)
result, latency = pipeline.evaluate_prompt("What is DataRobot?")
if result.blocked:
print(f"Blocked: {result.blocked_message}")
# Evaluate response only (post-score guards)
result, latency = pipeline.evaluate_response(
"DataRobot is an AI platform.",
prompt="What is DataRobot?",
)
print(f"Latency: {latency:.3f}s Blocked: {result.blocked} Metrics: {result.metrics}")
# Full pipeline: pre-score → LLM → post-score
def my_llm(prompt: str) -> str:
return "DataRobot is an AI platform." # replace with your LLM call
result = pipeline.evaluate_full_pipeline("What is DataRobot?", my_llm)
if result.blocked:
stage = "prompt" if result.prompt_evaluation.blocked else "response"
blocked_eval = (
result.prompt_evaluation if result.prompt_evaluation.blocked
else result.response_evaluation
)
print(f"Blocked at {stage}: {blocked_eval.blocked_message}")
elif result.replaced:
print(f"Text replaced. Response: {result.response}")
else:
print(f"Response: {result.response}")
print(f"Metrics: {result.response_evaluation.metrics}")
8b. From a Pydantic config object
Use ModerationPipeline.from_config to build the configuration entirely in Python — no YAML file required. This is useful for dynamic configurations, programmatic guard registration, or when embedding moderation in a larger application.
All schema types are importable from datarobot_dome.schema:
from datarobot_dome.schema import (
ModerationConfig,
TargetBlock,
# Guard subtypes — pick the matching one per guard
OOTBGuardSchema,
ModelGuardSchema,
NemoGuardrailsSchema,
NemoEvaluatorSchema,
# Nested schemas used inside guards
AdditionalGuardConfigSchema,
InterventionSchema,
InterventionConditionSchema,
ModelInfoSchema,
)
Schema type → guard type mapping
Guard YAML type |
Pydantic class |
|---|---|
ootb |
OOTBGuardSchema |
model |
ModelGuardSchema |
nemo_guardrails |
NemoGuardrailsSchema |
nemo_evaluator |
NemoEvaluatorSchema |
LLM Gateway example — hate speech / guideline adherence
import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
AdditionalGuardConfigSchema,
InterventionSchema,
ModerationConfig,
OOTBGuardSchema,
TargetBlock,
)
os.environ["TARGET_NAME"] = "resultText"
os.environ["DATAROBOT_ENDPOINT"] = "https://app.datarobot.com/api/v2"
os.environ["DATAROBOT_API_TOKEN"] = "<your-dr-token>"
config = ModerationConfig(
targets=[
TargetBlock(
target="_default",
guards=[
OOTBGuardSchema(
type="ootb",
name="Hate Speech",
stage="response",
ootb_type="agent_guideline_adherence",
llm_type="llmGateway",
llm_gateway_model_id="azure/gpt-4o-2024-11-20",
additional_guard_config=AdditionalGuardConfigSchema(
agent_guideline=(
"The response must not contain hate speech, slurs, or content "
"that demeans people based on race, religion, gender, nationality, "
"or any other protected characteristic."
)
),
intervention=InterventionSchema(
action="report",
conditions=[],
),
)
],
)
]
)
pipeline = ModerationPipeline.from_config(config)
text = "People from that group are living in France."
result, latency = pipeline.evaluate_response(response=text, prompt="Describe this text.")
score = result.metrics.get("agent_guideline_adherence_score")
print(f"score={score} latency={latency:.3f}s")
Model guard example
import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
InterventionConditionSchema,
InterventionSchema,
ModerationConfig,
ModelGuardSchema,
ModelInfoSchema,
TargetBlock,
)
os.environ["TARGET_NAME"] = "resultText"
os.environ["DATAROBOT_ENDPOINT"] = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
config = ModerationConfig(
targets=[
TargetBlock(
target="_default",
guards=[
ModelGuardSchema(
type="model",
name="Toxicity",
stage="prompt",
deployment_id="<your-toxicity-deployment-id>",
model_info=ModelInfoSchema(
input_column_name="text",
target_name="toxicity_toxic_PREDICTION",
target_type="Binary",
class_names=[],
),
intervention=InterventionSchema(
action="block",
message="Toxic content blocked.",
conditions=[
InterventionConditionSchema(comparand=0.5, comparator="greaterThan")
],
),
)
],
)
]
)
pipeline = ModerationPipeline.from_config(config)
With DRUM
Place moderation_config.yaml alongside your custom model code, then:
drum score --verbose \
--code-dir ./ \
--input ./input.csv \
--target-type textgeneration \
--runtime-params-file values.yaml
9. Testing guide
Set these environment variables before running any test (see §10 for details):
export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-token"
export TARGET_NAME="resultText"
Guards fall into four groups based on the credentials they require:
| Group | Guard types | Extra credentials needed |
|---|---|---|
| A — local | token_count, rouge_1, cost, custom_metric |
(none beyond the base vars above) |
| B — DataRobot deployment | type: model, any ootb with llm_type: datarobot or llm_type: llmGateway |
Only DATAROBOT_API_TOKEN; provide a real deployment_id |
| C — external LLM provider | Any ootb with llm_type: openAi, azureOpenAi, google, amazon, nim |
Provider-specific env var (see §10) |
| D — NeMo | type: nemo_guardrails, type: nemo_evaluator |
Provider key for NeMo Guardrails; DATAROBOT_API_TOKEN for NeMo Evaluator |
See §5 for complete YAML examples per guard type and §8 for Python usage patterns.
10. Environment variables
Always required
| Variable | Description |
|---|---|
DATAROBOT_ENDPOINT |
DataRobot instance URL, e.g. https://app.datarobot.com/api/v2 |
DATAROBOT_API_TOKEN |
DataRobot API token |
TARGET_NAME |
The name of the output field in your deployment's prediction response that contains the generated text (e.g. resultText). Required by all response-stage guards in standalone Python. DRUM sets this automatically. |
DISABLE_MODERATION |
Set to true to disable all guards at runtime. |
Credentials for LLM-eval guards using external providers
When your guard uses llm_type: datarobot, it reuses DATAROBOT_API_TOKEN — no extra variable needed.
For external providers (OpenAI, Azure OpenAI, Google, AWS), set a guard-specific env var. The variable name is built from the guard's type, stage, and ootb_type:
MLOPS_RUNTIME_PARAM_MODERATION_{TYPE}_{STAGE}_{OOTB_TYPE}_{PROVIDER_SUFFIX}
Guard (ootb_type) |
Provider | Environment variable |
|---|---|---|
task_adherence |
OpenAI | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_OPENAI_API_KEY |
task_adherence |
Azure OpenAI | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_AZURE_OPENAI_API_KEY |
faithfulness |
OpenAI | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_OPENAI_API_KEY |
faithfulness |
Azure OpenAI | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_AZURE_OPENAI_API_KEY |
agent_guideline_adherence |
Azure OpenAI | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_AZURE_OPENAI_API_KEY |
agent_guideline_adherence |
Google Vertex AI | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_GOOGLE_SERVICE_ACCOUNT |
agent_goal_accuracy |
Azure OpenAI | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AZURE_OPENAI_API_KEY |
agent_goal_accuracy |
AWS Bedrock | MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AWS_ACCOUNT |
nemo_guardrails (prompt) |
Azure OpenAI | MLOPS_RUNTIME_PARAM_MODERATION_NEMO_GUARDRAILS_PROMPT_AZURE_OPENAI_API_KEY |
Value format per provider:
# OpenAI / Azure OpenAI
'{"type":"credential","payload":{"credentialType":"api_token","apiToken":"YOUR_KEY"}}'
# Google Vertex AI
'{"type":"credential","payload":{"credentialType":"gcp","gcpKey":{...}}}'
# AWS Bedrock
'{"type":"credential","payload":{"credentialType":"s3","awsAccessKeyId":"...","awsSecretAccessKey":"...","awsSessionToken":"..."}}'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datarobot_moderations-11.2.28-py3-none-any.whl.
File metadata
- Download URL: datarobot_moderations-11.2.28-py3-none-any.whl
- Upload date:
- Size: 113.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.4.0 CPython/3.11.15 Linux/6.1.159-181.297.amzn2023.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c3113971f97d678d0dafbf4b5440cec57969e2514e14d899b35ddb8e9cdcc20
|
|
| MD5 |
0f1537c3617d4639f01b099c3ed6036b
|
|
| BLAKE2b-256 |
addd55d8f45c1c4bb303666d5c643f717257230983986ae9e72ad15a2c153632
|