Skip to content

Guardrails Configuration

Configure safety checks and validation rules for your application.

Configuration Options

Basic Settings

yaml
guardrails:
  input_checks: true    # Enable input validation
  output_checks: true   # Enable output validation

Check Types

yaml
guardrails:
  check_toxicity: true        # Enable toxicity detection
  check_sensitive_data: true  # Enable sensitive data detection
  check_semantic: true        # Enable content classification
  check_off_topic: false      # Enable off-topic detection
  check_sql_syntax: false     # Enable SQL syntax validation

Toxicity Settings

yaml
guardrails:
  check_toxicity: true
  toxicity_threshold: 0.7  # Threshold for blocking (0.0-1.0)
  block_toxic: true        # Block toxic content when detected

Toxicity Threshold: Content with toxicity confidence above this threshold will be blocked if block_toxic is enabled.

Sensitive Data Settings

yaml
guardrails:
  check_sensitive_data: true
  block_sensitive_data: true  # Block content containing sensitive data

Detected sensitive data types include:

  • Email addresses
  • Phone numbers
  • Credit card numbers
  • Social security numbers
  • IP addresses
  • And more...

Content Classification

yaml
guardrails:
  check_semantic: true  # Enable content classification

Content classification detects:

  • Jailbreak attempts: Attempts to bypass safety restrictions
  • Malicious content: Requests for harmful activities
  • Prompt injection: Attempts to inject malicious instructions
  • Malicious code injection: Code injection attempts

Off-Topic Detection

yaml
guardrails:
  check_off_topic: true
  block_off_topic: true
  allowed_topics:
    - name: "Product Information"
      description: "Questions about product features, specifications, and pricing"
    - name: "Technical Support"
      description: "Help with installation, troubleshooting, and technical issues"

Off-topic detection helps keep conversations focused on allowed subjects. See Off-Topic Detection for details.

SQL Syntax Validation

yaml
guardrails:
  check_sql_syntax: true
  sql_dialect: "mysql"  # postgresql, mysql, sqlserver, sqlite, mongodb, oracle, redshift

SQL syntax validation checks SQL queries for syntax errors. Supported dialects:

  • postgresql - PostgreSQL
  • mysql - MySQL/MariaDB
  • sqlserver - Microsoft SQL Server
  • sqlite - SQLite
  • mongodb - MongoDB
  • oracle - Oracle Database
  • redshift - Amazon Redshift

See SQL Syntax Validation for details.

Complete Example

yaml
guardrails:
  # Enable/disable checks
  input_checks: true
  output_checks: true
  
  # Specific check types
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  check_off_topic: false
  check_sql_syntax: false
  
  # Toxicity configuration
  toxicity_threshold: 0.7
  block_toxic: true
  
  # Sensitive data configuration
  block_sensitive_data: true
  
  # Off-topic detection configuration
  block_off_topic: true
  allowed_topics:
    - name: "Allowed Topic"
      description: "Description of allowed topic"
  
  # SQL syntax validation configuration
  sql_dialect: "mysql"

Configuration Reference

OptionTypeDefaultDescription
input_checksbooltrueEnable input validation
output_checksbooltrueEnable output validation
check_toxicitybooltrueEnable toxicity detection
check_sensitive_databooltrueEnable sensitive data detection
check_semanticbooltrueEnable content classification
check_off_topicboolfalseEnable off-topic detection
check_sql_syntaxboolfalseEnable SQL syntax validation
toxicity_thresholdfloat0.7Threshold for blocking toxic content (0.0-1.0)
block_toxicbooltrueBlock toxic content
block_sensitive_databooltrueBlock sensitive data
block_off_topicbooltrueBlock off-topic inputs
allowed_topicslistNoneList of allowed topics (required for off-topic detection)
sql_dialectstr"mysql"SQL dialect for syntax validation

Use Cases

Strict Mode

Block all potentially problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  toxicity_threshold: 0.5  # Lower threshold = more strict
  block_toxic: true
  block_sensitive_data: true

Permissive Mode

Only block clearly problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: false  # Allow sensitive data
  check_semantic: true
  toxicity_threshold: 0.9  # Higher threshold = more permissive
  block_toxic: true
  block_sensitive_data: false

Input-Only Mode

Only validate input, not output:

yaml
guardrails:
  input_checks: true
  output_checks: false
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true

Next Steps

Released under the MIT License.