Skip to content

Guardrails Configuration

Configure safety checks and validation rules for your application.

Configuration Options

Basic Settings

yaml
guardrails:
  input_checks: true    # Enable input validation
  output_checks: true   # Enable output validation

Check Types

yaml
guardrails:
  check_toxicity: true        # Enable toxicity detection
  check_sensitive_data: true  # Enable sensitive data detection
  check_semantic: true       # Enable content classification

Toxicity Settings

yaml
guardrails:
  check_toxicity: true
  toxicity_threshold: 0.7  # Threshold for blocking (0.0-1.0)
  block_toxic: true        # Block toxic content when detected

Toxicity Threshold: Content with toxicity confidence above this threshold will be blocked if block_toxic is enabled.

Sensitive Data Settings

yaml
guardrails:
  check_sensitive_data: true
  block_sensitive_data: true  # Block content containing sensitive data

Detected sensitive data types include:

  • Email addresses
  • Phone numbers
  • Credit card numbers
  • Social security numbers
  • IP addresses
  • And more...

Content Classification

yaml
guardrails:
  check_semantic: true  # Enable content classification

Content classification detects:

  • Jailbreak attempts: Attempts to bypass safety restrictions
  • Malicious content: Requests for harmful activities
  • Prompt injection: Attempts to inject malicious instructions
  • Malicious code injection: Code injection attempts

Complete Example

yaml
guardrails:
  # Enable/disable checks
  input_checks: true
  output_checks: true
  
  # Specific check types
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  
  # Toxicity configuration
  toxicity_threshold: 0.7
  block_toxic: true
  
  # Sensitive data configuration
  block_sensitive_data: true

Configuration Reference

OptionTypeDefaultDescription
input_checksbooltrueEnable input validation
output_checksbooltrueEnable output validation
check_toxicitybooltrueEnable toxicity detection
check_sensitive_databooltrueEnable sensitive data detection
check_semanticbooltrueEnable content classification
toxicity_thresholdfloat0.7Threshold for blocking toxic content (0.0-1.0)
block_toxicbooltrueBlock toxic content
block_sensitive_databooltrueBlock sensitive data

Use Cases

Strict Mode

Block all potentially problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  toxicity_threshold: 0.5  # Lower threshold = more strict
  block_toxic: true
  block_sensitive_data: true

Permissive Mode

Only block clearly problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: false  # Allow sensitive data
  check_semantic: true
  toxicity_threshold: 0.9  # Higher threshold = more permissive
  block_toxic: true
  block_sensitive_data: false

Input-Only Mode

Only validate input, not output:

yaml
guardrails:
  input_checks: true
  output_checks: false
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true

Next Steps

Released under the MIT License.