Skip to content

Guardrails Configuration

Configure safety checks and validation rules for your application.

Reference Configuration

The following guardrail policy matches the project config.yml and serves as the canonical reference for all available options:

yaml
# Guardrail policy configuration

guardrails:
  input_checks: true
  output_checks: true

  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  toxicity_threshold: 0.7
  block_toxic: true
  block_sensitive_data: true

  # PII/PHI detection policy
  pii:
    enabled: true
    input_checks: true
    output_checks: true
    language: en
    default_confidence_threshold: 0.5
    below_threshold_action: flag
    default_action: flag
    default_mask: true
    enable_phi_detection: true
    entity_types:
      - PERSON
      - LOCATION
      - EMAIL_ADDRESS
      - PHONE_NUMBER
      - CREDIT_CARD
      - NRP
      - MEDICAL_LICENSE
      - US_SSN
      - IBAN_CODE
      - IP_ADDRESS
    entity_thresholds:
      PERSON: 0.7
    entity_policies:
      CREDIT_CARD:
        action: block
        mask: true
      US_SSN:
        action: block
        mask: true
      EMAIL_ADDRESS:
        action: flag
        mask: true
      PHONE_NUMBER:
        action: flag
        mask: true
      PHI_MRN:
        action: review
        mask: true
      PHI_PATIENT_ID:
        action: review
        mask: true

  # Token budget enforcement policy
  token_budget:
    enabled: true
    input_checks: true
    output_checks: true
    max_request_tokens: 50
    max_run_tokens: 80
    reserved_output_tokens: 10

Configuration Options

Basic Settings

yaml
guardrails:
  input_checks: true    # Enable input validation
  output_checks: true   # Enable output validation

Check Types

yaml
guardrails:
  check_toxicity: true        # Enable toxicity detection
  check_sensitive_data: true  # Enable sensitive data detection
  check_semantic: true        # Enable content classification
  check_off_topic: false      # Enable off-topic detection
  check_sql_syntax: false     # Enable SQL syntax validation

Toxicity Settings

yaml
guardrails:
  check_toxicity: true
  toxicity_threshold: 0.7  # Threshold for blocking (0.0-1.0)
  block_toxic: true        # Block toxic content when detected

Toxicity Threshold: Content with toxicity confidence above this threshold will be blocked if block_toxic is enabled.

Sensitive Data Settings

yaml
guardrails:
  check_sensitive_data: true
  block_sensitive_data: true  # Block content containing sensitive data

Detected sensitive data types include:

  • Email addresses
  • Phone numbers
  • Credit card numbers
  • Social security numbers
  • IP addresses
  • And more...

Content Classification

yaml
guardrails:
  check_semantic: true  # Enable content classification

Content classification detects:

  • Jailbreak attempts: Attempts to bypass safety restrictions
  • Malicious content: Requests for harmful activities
  • Prompt injection: Attempts to inject malicious instructions
  • Malicious code injection: Code injection attempts

Off-Topic Detection

yaml
guardrails:
  check_off_topic: true
  block_off_topic: true
  allowed_topics:
    - name: "Product Information"
      description: "Questions about product features, specifications, and pricing"
    - name: "Technical Support"
      description: "Help with installation, troubleshooting, and technical issues"

Off-topic detection helps keep conversations focused on allowed subjects. See Off-Topic Detection for details.

SQL Syntax Validation

yaml
guardrails:
  check_sql_syntax: true
  sql_dialect: "mysql"  # postgresql, mysql, sqlserver, sqlite, mongodb, oracle, redshift

SQL syntax validation checks SQL queries for syntax errors. Supported dialects:

  • postgresql - PostgreSQL
  • mysql - MySQL/MariaDB
  • sqlserver - Microsoft SQL Server
  • sqlite - SQLite
  • mongodb - MongoDB
  • oracle - Oracle Database
  • redshift - Amazon Redshift

See SQL Syntax Validation for details.

PII/PHI Detection and Data Masking

Requires the spaCy model after package installation:

bash
python -m spacy download en_core_web_lg
yaml
guardrails:
  pii:
    enabled: true
    input_checks: true
    output_checks: true
    language: en
    default_confidence_threshold: 0.5
    below_threshold_action: flag
    default_action: flag
    default_mask: true
    enable_phi_detection: true
    entity_types:
      - PERSON
      - LOCATION
      - EMAIL_ADDRESS
      - PHONE_NUMBER
      - CREDIT_CARD
      - NRP
      - MEDICAL_LICENSE
      - US_SSN
      - IBAN_CODE
      - IP_ADDRESS
    entity_thresholds:
      PERSON: 0.7
    entity_policies:
      CREDIT_CARD:
        action: block
        mask: true
      US_SSN:
        action: block
        mask: true
      EMAIL_ADDRESS:
        action: flag
        mask: true
      PHONE_NUMBER:
        action: flag
        mask: true
      PHI_MRN:
        action: review
        mask: true
      PHI_PATIENT_ID:
        action: review
        mask: true

Supported entity types:

Entity TypeDescription
PERSONPersonal names
LOCATIONGeographic locations
EMAIL_ADDRESSEmail addresses
PHONE_NUMBERPhone numbers
CREDIT_CARDCredit card numbers
NRPNationalities, religious, or political groups
MEDICAL_LICENSEMedical license numbers
US_SSNU.S. Social Security numbers
IBAN_CODEInternational bank account numbers
IP_ADDRESSIP addresses
PHI_MRNMedical record numbers (regex-based PHI detection)
PHI_PATIENT_IDPatient identifiers (regex-based PHI detection)

PII/PHI detection identifies sensitive entities using Microsoft Presidio Analyzer, applies configurable policy actions (flag, block, review, pass), supports data masking, and logs detection events. See PII/PHI Detection for details.

Token Budget Enforcement

yaml
guardrails:
  token_budget:
    enabled: true
    input_checks: true
    output_checks: true
    max_request_tokens: 50
    max_run_tokens: 80
    reserved_output_tokens: 10

Token budget enforcement computes token usage across the full request context and rejects oversized requests. See Token Budget Enforcement for details.

Configuration Reference

OptionTypeDefaultDescription
input_checksbooltrueEnable input validation
output_checksbooltrueEnable output validation
check_toxicitybooltrueEnable toxicity detection
check_sensitive_databooltrueEnable sensitive data detection
check_semanticbooltrueEnable content classification
check_off_topicboolfalseEnable off-topic detection
check_sql_syntaxboolfalseEnable SQL syntax validation
toxicity_thresholdfloat0.7Threshold for blocking toxic content (0.0-1.0)
block_toxicbooltrueBlock toxic content
block_sensitive_databooltrueBlock sensitive data
block_off_topicbooltrueBlock off-topic inputs
allowed_topicslistNoneList of allowed topics (required for off-topic detection)
sql_dialectstr"mysql"SQL dialect for syntax validation
piidictPII/PHI detection and data masking policy (see below)
token_budgetdictToken budget enforcement policy (see below)

PII/PHI Detection Options

OptionTypeDefaultDescription
pii.enabledboolfalseEnable PII/PHI detection
pii.input_checksbooltrueRun detection on user input
pii.output_checksbooltrueRun detection on model output
pii.languagestr"en"Language code for entity analysis
pii.default_confidence_thresholdfloat0.5Global minimum confidence for entity recognition
pii.below_threshold_actionstr"flag"Action for entities below their threshold
pii.default_actionstr"flag"Default action when no entity policy is defined
pii.default_maskbooltrueMask detected values by default
pii.enable_phi_detectionbooltrueEnable regex-based PHI pattern detection
pii.entity_typeslistEntity types to detect
pii.entity_thresholdsdictPer-entity confidence overrides
pii.entity_policiesdictPer-entity action and masking rules

Entity Policy Options

Each key under entity_policies is an entity type name. Each policy supports:

FieldTypeValuesDescription
actionstrflag, block, review, passPolicy action applied when the entity is detected
maskbooltrue, falseWhether to mask the detected value before downstream processing

Example entity policies from config.yml:

EntityActionMaskBehavior
CREDIT_CARDblocktrueBlock request and mask value
US_SSNblocktrueBlock request and mask value
EMAIL_ADDRESSflagtrueFlag detection and mask value
PHONE_NUMBERflagtrueFlag detection and mask value
PHI_MRNreviewtrueMark for review and mask value
PHI_PATIENT_IDreviewtrueMark for review and mask value

Token Budget Options

OptionTypeDefaultDescription
token_budget.enabledboolfalseEnable token budget enforcement
token_budget.input_checksbooltrueEnforce limits on incoming requests
token_budget.output_checksbooltrueEnforce limits on model output
token_budget.max_request_tokensintMaximum tokens for a single request context
token_budget.max_run_tokensintMaximum total tokens for an entire run
token_budget.reserved_output_tokensintTokens reserved for the model response

Use Cases

Strict Mode

Block all potentially problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  toxicity_threshold: 0.5  # Lower threshold = more strict
  block_toxic: true
  block_sensitive_data: true
  pii:
    enabled: true
    default_action: block
    default_mask: true

Permissive Mode

Only block clearly problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: false  # Allow sensitive data
  check_semantic: true
  toxicity_threshold: 0.9  # Higher threshold = more permissive
  block_toxic: true
  block_sensitive_data: false
  pii:
    enabled: true
    default_action: flag
    default_mask: false

Input-Only Mode

Only validate input, not output:

yaml
guardrails:
  input_checks: true
  output_checks: false
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  pii:
    enabled: true
    input_checks: true
    output_checks: false
  token_budget:
    enabled: true
    input_checks: true
    output_checks: false

Next Steps

Released under the MIT License.