C3 AI Documentation Home

LLM Guardrails

LLM guardrails let you check prompts and responses for problematic content, and either modify or flag them before they reach an external LLM or your application. Guardrails are applied to all prompts just before they are sent and to all responses as soon as they return.

Configure LLM guardrails

See Genai.LlmGuardrails.Manager for details.

Configure the node on which the agent is deployed with at least one T4 GPU, as LLM guardrails use a local model loaded into memory.

Configure input processors

The following input processors are available:

ProcessorDescription
Genai.LlmGuardrails.Processor.ToxicSpeechDetects toxic or malicious input using a local model. Raises an error on detection.
Genai.LlmGuardrails.Processor.PromptInjectionDetects prompt injection attempts using a local model. Raises an error on detection.
Genai.LlmGuardrails.Processor.AzureTextModerationUses the Azure Content Safety API to moderate text across configurable harm categories.
Genai.LlmGuardrails.Processor.AzurePromptShield(Beta) Uses Azure Content Safety to detect jailbreaks and prompt injections. Not recommended for production.

To enable input processors, run the following:

JavaScript
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.ToxicSpeech.inst(),
  Genai.LlmGuardrails.Processor.PromptInjection.inst()
]);

All input processors raise errors if they detect a problematic prompt.

Azure Text Moderation

Genai.LlmGuardrails.Processor.AzureTextModeration uses the Azure Content Safety API to moderate prompts across configurable harm categories. Unlike the local-model processors, it requires Azure credentials.

Prerequisites: Configure Genai.LlmGuardrails.Processor.AzureContentModeration.Config with your Azure Content Safety resource details:

JavaScript
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setSecretValue('apiKey', '<your-azure-content-safety-key>', ConfigOverride.APP);
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setConfigValue('endPoint', 'https://<your-resource>.cognitiveservices.azure.com/', ConfigOverride.APP);
Genai.LlmGuardrails.Processor.AzureContentModeration.Config.inst().setConfigValue('region', '<your-region>', ConfigOverride.APP);

The apiVersion defaults to 2024-09-01 and does not need to be set unless you require a different version.

Configuration options:

FieldDefaultDescription
categories['Hate', 'Sexual', 'Violence', 'SelfHarm']Harm categories to check. All four are used by default.
blocklistNames(none)Names of custom Azure blocklists to apply.
haltOnBlocklistHitfalseWhen true, stops further analysis as soon as a blocklist entry matches.
severityLevel"FourSeverityLevels"Scale to use: "FourSeverityLevels" (0, 2, 4, 6) or "EightSeverityLevels" (0–7).
severityLevelThreshold-1 (use provider default)Prompts at or above this level are blocked. Default is 4 for four-level and 6 for eight-level.

Example — enable with default settings:

JavaScript
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.AzureTextModeration.inst()
]);

Example — restrict to violence only, using the eight-level scale with threshold 3:

JavaScript
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.AzureTextModeration.make({
    categories: ['Violence'],
    severityLevel: 'EightSeverityLevels',
    severityLevelThreshold: 3
  })
]);

Azure Prompt Shield (Beta)

This processor uses the Azure Content Safety Prompt Shield API to detect jailbreak attempts and indirect prompt injection attacks. It uses the same Genai.LlmGuardrails.Processor.AzureContentModeration.Config credentials as AzureTextModeration.

To learn more, see Prompt Shield quickstart.

JavaScript
Genai.LlmGuardrails.Manager.setConfigValue('inputProcessors', [
  Genai.LlmGuardrails.Processor.AzurePromptShield.inst()
]);

Configure output processors

To enable the current output processor, run the following:

JavaScript
Genai.LlmGuardrails.Manager.setConfigValue('outputProcessors', [Genai.LlmGuardrails.Processor.PiiMasking.inst()]);

The output processor behaves as follows:

Configure processors for dynamic agent

To configure guardrails for the dynamic agent, run the following:

Python
c3.Genai.LlmGuardrails.Manager.setConfigValue("inputProcessors", [c3.Genai.LlmGuardrails.Processor.ToxicSpeech.inst()]) # replace with the input processors you want to use
c3.Genai.LlmGuardrails.Manager.setConfigValue("outputProcessors", [c3.Genai.LlmGuardrails.Processor.PiiMasking.inst()]) # replace with the output processors you want to use

def preprocess(messages):
    """
    Replace with any preprocessing logic
    """
    message = messages[-1]
    updated_text = c3.Genai.LlmGuardrails.Manager.processInput(message["content"][0]["text"]).updatedValue.toString()
    message["content"][0]["text"] = updated_text
    return messages[:-1] + [message]


def postprocess(response):
    """
    Replace with any postprocessing logic
    """
    response.choices[0].message.original_content = response.choices[0].message.content
    response.choices[0].message.content = c3.Genai.LlmGuardrails.Manager.processOutput(response.choices[0].message.content).updatedValue.toString()
    return response

preprocess_lambda = c3.Lambda.fromPyFunc(preprocess)
postprocess_lambda = c3.Lambda.fromPyFunc(postprocess)

processor = c3.GenaiCore.Llm.Processor.Lambda(
    preprocessLambda=preprocess_lambda, postprocessLambda=postprocess_lambda
)

c3.GenaiCore.Llm.Completion.Client.make({
  "name": "default-completions",
  "model": {
    "type": "GenaiCore.Llm.AzureOpenAi.Model",
    "model": "gpt-4o",
    "processor": processor,
    "auth": {
      "type": "GenaiCore.Llm.AzureOpenAi.Auth",
      "name": "default-auth"
    },
    "defaultOptions": {
      "stop": ["</plan>", "</thought>", "</execute>", "</solution>"],
      "temperature": 0.0
    }
  }
}).setConfig()

Test the guardrail for the dynamic agent

In a test, you can make sure that the guardrail is working.

Python
sample_prompt = "My phone number is 555-555-5555"

result = c3.Genai.LlmGuardrails.Manager.inst().processOutput(sample_prompt)

print("Original Value:", result.originalValue)
print("Current Value:", result.currentValue)
print("Updated Value:", result.updatedValue)

You will see

Text
Original Value: My phone number is 555-555-5555
Current Value: My phone number is 555-555-5555
Updated Value: My phone number is [REDACTED_PHONE_NUMBER_5]

If you are using another LLM, set the processor for the respective config.

Clear guardrails configuration

  • To disable all guardrails, run Genai.LlmGuardrails.Manager.clearConfigAndSecretOverride(ConfigOverride.APP).
  • To disable just the input processors, run Genai.LlmGuardrails.Manager.clearConfigValue('inputProcessors', ConfigOverride.APP).
  • To disable just the output processors, run Genai.LlmGuardrails.Manager.clearConfigValue('outputProcessors', ConfigOverride.APP).

See also

Was this page helpful?