LLM Guardrails
LLM guardrails let you check prompts and responses for problematic content, and either modify or flag them before they reach an external LLM or your application. Guardrails are applied to all prompts just before they are sent and to all responses as soon as they return.
Configure LLM guardrails
See Genai.LlmGuardrails.Manager for details.
Configure the node on which the agent is deployed with at least one T4 GPU, as LLM guardrails use a local model loaded into memory.
Configure input processors
The following input processors are available:
| Processor | Description |
|---|---|
| Genai.LlmGuardrails.Processor.ToxicSpeech | Detects toxic or malicious input using a local model. Raises an error on detection. |
| Genai.LlmGuardrails.Processor.PromptInjection | Detects prompt injection attempts using a local model. Raises an error on detection. |
| Genai.LlmGuardrails.Processor.AzureTextModeration | Uses the Azure Content Safety API to moderate text across configurable harm categories. |
| Genai.LlmGuardrails.Processor.AzurePromptShield | (Beta) Uses Azure Content Safety to detect jailbreaks and prompt injections. Not recommended for production. |
To enable input processors, run the following:
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
Genai.LlmGuardrails.Processor.ToxicSpeech.inst(),
Genai.LlmGuardrails.Processor.PromptInjection.inst()
]);All input processors raise errors if they detect a problematic prompt.
Azure Text Moderation
Genai.LlmGuardrails.Processor.AzureTextModeration uses the Azure Content Safety API to moderate prompts across configurable harm categories. Unlike the local-model processors, it requires Azure credentials.
Prerequisites: Configure Genai.LlmGuardrails.Processor.AzureContentModeration.Config with your Azure Content Safety resource details:
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setSecretValue('apiKey', '<your-azure-content-safety-key>', ConfigOverride.APP);
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setConfigValue('endPoint', 'https://<your-resource>.cognitiveservices.azure.com/', ConfigOverride.APP);
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setConfigValue('region', '<your-region>', ConfigOverride.APP);The apiVersion defaults to 2024-09-01 and does not need to be set unless you require a different version.
Configuration options:
| Field | Default | Description |
|---|---|---|
categories | ['Hate', 'Sexual', 'Violence', 'SelfHarm'] | Harm categories to check. All four are used by default. |
blocklistNames | (none) | Names of custom Azure blocklists to apply. |
haltOnBlocklistHit | false | When true, stops further analysis as soon as a blocklist entry matches. |
severityLevel | "FourSeverityLevels" | Scale to use: "FourSeverityLevels" (0, 2, 4, 6) or "EightSeverityLevels" (0–7). |
severityLevelThreshold | -1 (use provider default) | Prompts at or above this level are blocked. Default is 4 for four-level and 6 for eight-level. |
Example — enable with default settings:
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
Genai.LlmGuardrails.Processor.AzureTextModeration.inst()
]);Example — restrict to violence only, using the eight-level scale with threshold 3:
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
Genai.LlmGuardrails.Processor.AzureTextModeration.make({
categories: ['Violence'],
severityLevel: 'EightSeverityLevels',
severityLevelThreshold: 3
})
]);Azure Prompt Shield (Beta)
Genai.LlmGuardrails.Processor.AzurePromptShield is in beta and is not recommended for production use.
This processor uses the Azure Content Safety Prompt Shield API to detect jailbreak attempts and indirect prompt injection attacks. It uses the same Genai.LlmGuardrails.Processor.AzureContentModeration.Config credentials as AzureTextModeration.
To learn more, see Prompt Shield quickstart.
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
Genai.LlmGuardrails.Processor.AzurePromptShield.inst()
]);Configure output processors
To enable the current output processor, run the following:
Genai.LlmGuardrails.Manager.setConfigValue('outputProcessors', [Genai.LlmGuardrails.Processor.PiiMasking.inst()]);The output processor behaves as follows:
- The PII masking processor redacts PII it finds in the response from the LLM.
- The classes of PII that are redacted are specified in Genai.LlmGuardrails.Processor.PiiMasking#piiClasses.
Configure processors for dynamic agent
To configure guardrails for the dynamic agent, run the following:
c3.Genai.LlmGuardrails.Manager.setConfigValue("inputProcessors", [c3.Genai.LlmGuardrails.Processor.ToxicSpeech.inst()]) # replace with the input processors you want to use
c3.Genai.LlmGuardrails.Manager.setConfigValue("outputProcessors", [c3.Genai.LlmGuardrails.Processor.PiiMasking.inst()]) # replace with the output processors you want to use
def preprocess(messages):
"""
Replace with any preprocessing logic
"""
message = messages[-1]
updated_text = c3.Genai.LlmGuardrails.Manager.processInput(message["content"][0]["text"]).updatedValue.toString()
message["content"][0]["text"] = updated_text
return messages[:-1] + [message]
def postprocess(response):
"""
Replace with any postprocessing logic
"""
response.choices[0].message.original_content = response.choices[0].message.content
response.choices[0].message.content = c3.Genai.LlmGuardrails.Manager.processOutput(response.choices[0].message.content).updatedValue.toString()
return response
preprocess_lambda = c3.Lambda.fromPyFunc(preprocess)
postprocess_lambda = c3.Lambda.fromPyFunc(postprocess)
processor = c3.GenaiCore.Llm.Processor.Lambda(
preprocessLambda=preprocess_lambda, postprocessLambda=postprocess_lambda
)
c3.GenaiCore.Llm.Completion.Client.make({
"name": "default-completions",
"model": {
"type": "GenaiCore.Llm.AzureOpenAi.Model",
"model": "gpt-4o",
"processor": processor,
"auth": {
"type": "GenaiCore.Llm.AzureOpenAi.Auth",
"name": "default-auth"
},
"defaultOptions": {
"stop": ["</plan>", "</thought>", "</execute>", "</solution>"],
"temperature": 0.0
}
}
}).setConfig()Test the guardrail for the dynamic agent
In a test, you can make sure that the guardrail is working.
sample_prompt = "My phone number is 555-555-5555"
result = c3.Genai.LlmGuardrails.Manager.inst().processOutput(sample_prompt)
print("Original Value:", result.originalValue)
print("Current Value:", result.currentValue)
print("Updated Value:", result.updatedValue)
You will see
Original Value: My phone number is 555-555-5555
Current Value: My phone number is 555-555-5555
Updated Value: My phone number is [REDACTED_PHONE_NUMBER_5]If you are using another LLM, set the processor for the respective config.
Clear guardrails configuration
- To disable all guardrails, run
Genai.LlmGuardrails.Manager.clearConfigAndSecretOverride(ConfigOverride.APP). - To disable just the input processors, run
Genai.LlmGuardrails.Manager.clearConfigValue('inputProcessors', ConfigOverride.APP). - To disable just the output processors, run
Genai.LlmGuardrails.Manager.clearConfigValue('outputProcessors', ConfigOverride.APP).