LLM Guardrails

LLM guardrails let you check prompts and responses for problematic content, and either modify or flag them before they reach an external LLM or your application. Guardrails are applied to all prompts just before they are sent and to all responses as soon as they return.

Configure LLM guardrails

See Genai.LlmGuardrails.Manager for details.

Configure the node on which the agent is deployed with at least one T4 GPU, as LLM guardrails use a local model loaded into memory.

Configure input processors

The following input processors are available:

Processor	Description
Genai.LlmGuardrails.Processor.ToxicSpeech	Detects toxic or malicious input using a local model. Raises an error on detection.
Genai.LlmGuardrails.Processor.PromptInjection	Detects prompt injection attempts using a local model. Raises an error on detection.
Genai.LlmGuardrails.Processor.AzureTextModeration	Uses the Azure Content Safety API to moderate text across configurable harm categories.
Genai.LlmGuardrails.Processor.AzurePromptShield	(Beta) Uses Azure Content Safety to detect jailbreaks and prompt injections. Not recommended for production.

To enable input processors, run the following:

JavaScript

Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.ToxicSpeech.inst(),
  Genai.LlmGuardrails.Processor.PromptInjection.inst()
]);

All input processors raise errors if they detect a problematic prompt.

Azure Text Moderation

Genai.LlmGuardrails.Processor.AzureTextModeration uses the Azure Content Safety API to moderate prompts across configurable harm categories. Unlike the local-model processors, it requires Azure credentials.

Prerequisites: Configure Genai.LlmGuardrails.Processor.AzureContentModeration.Config with your Azure Content Safety resource details:

JavaScript

Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setSecretValue('apiKey', '<your-azure-content-safety-key>', ConfigOverride.APP);
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setConfigValue('endPoint', 'https://<your-resource>.cognitiveservices.azure.com/', ConfigOverride.APP);
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setConfigValue('region', '<your-region>', ConfigOverride.APP);

The apiVersion defaults to 2024-09-01 and does not need to be set unless you require a different version.

Configuration options:

Field	Default	Description
`categories`	`['Hate', 'Sexual', 'Violence', 'SelfHarm']`	Harm categories to check. All four are used by default.
`blocklistNames`	(none)	Names of custom Azure blocklists to apply.
`haltOnBlocklistHit`	`false`	When `true`, stops further analysis as soon as a blocklist entry matches.
`severityLevel`	`"FourSeverityLevels"`	Scale to use: `"FourSeverityLevels"` (0, 2, 4, 6) or `"EightSeverityLevels"` (0–7).
`severityLevelThreshold`	`-1` (use provider default)	Prompts at or above this level are blocked. Default is 4 for four-level and 6 for eight-level.

Example — enable with default settings:

JavaScript

Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.AzureTextModeration.inst()
]);

Example — restrict to violence only, using the eight-level scale with threshold 3:

JavaScript

Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.AzureTextModeration.make({
    categories: ['Violence'],
    severityLevel: 'EightSeverityLevels',
    severityLevelThreshold: 3
  })
]);

Azure Prompt Shield (Beta)

Genai.LlmGuardrails.Processor.AzurePromptShield is in beta and is not recommended for production use.

This processor uses the Azure Content Safety Prompt Shield API to detect jailbreak attempts and indirect prompt injection attacks. It uses the same Genai.LlmGuardrails.Processor.AzureContentModeration.Config credentials as AzureTextModeration.

To learn more, see Prompt Shield quickstart.

JavaScript

Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.AzurePromptShield.inst()
]);

Configure output processors

To enable the current output processor, run the following:

JavaScript

Genai.LlmGuardrails.Manager.setConfigValue('outputProcessors', [Genai.LlmGuardrails.Processor.PiiMasking.inst()]);

The output processor behaves as follows:

The PII masking processor redacts PII it finds in the response from the LLM.
The classes of PII that are redacted are specified in Genai.LlmGuardrails.Processor.PiiMasking#piiClasses.

Configure processors for dynamic agent

To configure guardrails for the dynamic agent, run the following:

Python

c3.Genai.LlmGuardrails.Manager.setConfigValue("inputProcessors", [c3.Genai.LlmGuardrails.Processor.ToxicSpeech.inst()]) # replace with the input processors you want to use
c3.Genai.LlmGuardrails.Manager.setConfigValue("outputProcessors", [c3.Genai.LlmGuardrails.Processor.PiiMasking.inst()]) # replace with the output processors you want to use

def preprocess(messages):
    """
    Replace with any preprocessing logic
    """
    message = messages[-1]
    updated_text = c3.Genai.LlmGuardrails.Manager.processInput(message["content"][0]["text"]).updatedValue.toString()
    message["content"][0]["text"] = updated_text
    return messages[:-1] + [message]


def postprocess(response):
    """
    Replace with any postprocessing logic
    """
    response.choices[0].message.original_content = response.choices[0].message.content
    response.choices[0].message.content = c3.Genai.LlmGuardrails.Manager.processOutput(response.choices[0].message.content).updatedValue.toString()
    return response

preprocess_lambda = c3.Lambda.fromPyFunc(preprocess)
postprocess_lambda = c3.Lambda.fromPyFunc(postprocess)

processor = c3.GenaiCore.Llm.Processor.Lambda(
    preprocessLambda=preprocess_lambda, postprocessLambda=postprocess_lambda
)

c3.GenaiCore.Llm.Completion.Client.make({
  "name": "default-completions",
  "model": {
    "type": "GenaiCore.Llm.AzureOpenAi.Model",
    "model": "gpt-4o",
    "processor": processor,
    "auth": {
      "type": "GenaiCore.Llm.AzureOpenAi.Auth",
      "name": "default-auth"
    },
    "defaultOptions": {
      "stop": ["</plan>", "</thought>", "</execute>", "</solution>"],
      "temperature": 0.0
    }
  }
}).setConfig()

Test the guardrail for the dynamic agent

In a test, you can make sure that the guardrail is working.

Python

sample_prompt = "My phone number is 555-555-5555"

result = c3.Genai.LlmGuardrails.Manager.inst().processOutput(sample_prompt)

print("Original Value:", result.originalValue)
print("Current Value:", result.currentValue)
print("Updated Value:", result.updatedValue)

You will see

Text

Original Value: My phone number is 555-555-5555
Current Value: My phone number is 555-555-5555
Updated Value: My phone number is [REDACTED_PHONE_NUMBER_5]

If you are using another LLM, set the processor for the respective config.

Clear guardrails configuration

To disable all guardrails, run Genai.LlmGuardrails.Manager.clearConfigAndSecretOverride(ConfigOverride.APP).
To disable just the input processors, run Genai.LlmGuardrails.Manager.clearConfigValue('inputProcessors', ConfigOverride.APP).
To disable just the output processors, run Genai.LlmGuardrails.Manager.clearConfigValue('outputProcessors', ConfigOverride.APP).

Copy link to this sectionConfigure LLM guardrails

Copy link to this sectionConfigure input processors

Copy link to this sectionAzure Text Moderation

Copy link to this sectionAzure Prompt Shield (Beta)

Copy link to this sectionConfigure output processors

Copy link to this sectionConfigure processors for dynamic agent

Copy link to this sectionTest the guardrail for the dynamic agent

Copy link to this sectionClear guardrails configuration

Copy link to this sectionSee also

Configure LLM guardrails

Configure input processors

Azure Text Moderation

Azure Prompt Shield (Beta)

Configure output processors

Configure processors for dynamic agent

Test the guardrail for the dynamic agent

Clear guardrails configuration

See also