Appearance
PII/PHI Detection and Data Masking
Identify sensitive personal and health information in user input and model output, apply policy-driven actions, and mask detected values before downstream processing.
Overview
PII/PHI detection runs as a pre-processing middleware step before agent or LLM processing. It integrates Microsoft Presidio Analyzer to identify sensitive entities in text, supports configurable confidence thresholds and per-entity policies, and complements entity detection with regex-based PHI pattern matching for healthcare identifiers.
Use this guardrail when you need fine-grained control over how different data types are handled—flagged for review, blocked, masked, or passed through—while maintaining an audit trail of every detection event.
Prerequisites
PII/PHI detection requires the spaCy English language model. Install it after installing the Elsai Guardrails package:
bash
pip install --extra-index-url https://elsai-core-package.optisolbusiness.com/root/elsai-guardrails/ elsai-guardrails==0.1.3
python -m spacy download en_core_web_lgThis download is a one-time setup step and is required before enabling PII/PHI detection in your configuration.
How It Works
- Text is analyzed before agent or LLM processing (and optionally on output).
- Presidio identifies configured entity types with confidence scores.
- Additional regex-based patterns detect PHI identifiers such as medical record numbers and patient IDs.
- Each entity is evaluated against global and per-entity confidence thresholds.
- The configured policy action is applied: flag, block, review, or pass.
- When masking is enabled, detected values are replaced before downstream processing.
- Detection events are logged with entity type, confidence score, action taken, session ID, and timestamp.
Configuration
Enable PII/PHI Detection
yaml
guardrails:
pii:
enabled: true
input_checks: true
output_checks: true
language: enSupported Entity Types
The following entity types can be detected and configured individually:
| Entity Type | Description |
|---|---|
PERSON | Personal names |
LOCATION | Geographic locations |
EMAIL_ADDRESS | Email addresses |
PHONE_NUMBER | Phone numbers |
CREDIT_CARD | Credit card numbers |
NRP | Nationalities, religious, or political groups |
MEDICAL_LICENSE | Medical license numbers |
US_SSN | U.S. Social Security numbers |
IBAN_CODE | International bank account numbers |
IP_ADDRESS | IP addresses |
PHI_MRN | Medical record numbers (regex-based PHI detection) |
PHI_PATIENT_ID | Patient identifiers (regex-based PHI detection) |
Specify which types to scan:
yaml
guardrails:
pii:
entity_types:
- PERSON
- LOCATION
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
- NRP
- MEDICAL_LICENSE
- US_SSN
- IBAN_CODE
- IP_ADDRESSConfidence Thresholds
Set a global default threshold and override it for specific entity types. Entities below the applicable threshold are handled according to below_threshold_action.
yaml
guardrails:
pii:
default_confidence_threshold: 0.5
below_threshold_action: flag
entity_thresholds:
PERSON: 0.7Policy-Based Actions
Each entity type can be assigned an action and optional masking behavior:
| Action | Behavior |
|---|---|
flag | Record the detection and allow processing to continue |
block | Stop processing and reject the request |
review | Mark for human review while allowing or holding the request |
pass | Allow the entity through without intervention |
yaml
guardrails:
pii:
default_action: flag
default_mask: true
enable_phi_detection: true
entity_policies:
CREDIT_CARD:
action: block
mask: true
US_SSN:
action: block
mask: true
EMAIL_ADDRESS:
action: flag
mask: true
PHONE_NUMBER:
action: flag
mask: true
PHI_MRN:
action: review
mask: true
PHI_PATIENT_ID:
action: review
mask: trueComplete Example
yaml
guardrails:
input_checks: true
output_checks: true
check_toxicity: true
check_sensitive_data: true
check_semantic: true
pii:
enabled: true
input_checks: true
output_checks: true
language: en
default_confidence_threshold: 0.5
below_threshold_action: flag
default_action: flag
default_mask: true
enable_phi_detection: true
entity_types:
- PERSON
- LOCATION
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
- NRP
- MEDICAL_LICENSE
- US_SSN
- IBAN_CODE
- IP_ADDRESS
entity_thresholds:
PERSON: 0.7
entity_policies:
CREDIT_CARD:
action: block
mask: true
US_SSN:
action: block
mask: true
EMAIL_ADDRESS:
action: flag
mask: true
PHONE_NUMBER:
action: flag
mask: true
PHI_MRN:
action: review
mask: true
PHI_PATIENT_ID:
action: review
mask: trueConfiguration Reference
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable PII/PHI detection |
input_checks | bool | true | Run detection on user input |
output_checks | bool | true | Run detection on model output |
language | str | "en" | Language code for entity analysis |
default_confidence_threshold | float | 0.5 | Global minimum confidence for entity recognition |
below_threshold_action | str | "flag" | Action for entities below their threshold |
default_action | str | "flag" | Default action when no entity policy is defined |
default_mask | bool | true | Mask detected values by default |
enable_phi_detection | bool | true | Enable regex-based PHI pattern detection |
entity_types | list | — | Entity types to detect |
entity_thresholds | dict | — | Per-entity confidence overrides |
entity_policies | dict | — | Per-entity action and masking rules |
Each key under entity_policies supports:
| Field | Type | Values | Description |
|---|---|---|---|
action | str | flag, block, review, pass | Policy action applied when the entity is detected |
mask | bool | true, false | Whether to mask the detected value before downstream processing |
Audit Logging
Each detection event is logged with the following fields:
entity_type— The detected entity categoryconfidence_score— Model confidence for the detectionaction_taken— Policy action applied (flag, block, review, pass)session_id— Session identifier for traceabilitytimestamp— Time of the detection event
Use these logs for compliance reporting, security monitoring, and tuning confidence thresholds over time.
Data Masking
When mask: true is set (globally via default_mask or per entity in entity_policies), detected values are replaced in the text before it reaches the agent or LLM. This reduces exposure of sensitive data in prompts, logs, and downstream systems while still allowing the request to proceed when the policy permits.
Input vs Output Checks
Control where detection runs independently of the global input_checks and output_checks settings:
yaml
guardrails:
pii:
enabled: true
input_checks: true # Scan user messages
output_checks: true # Scan model responsesFor applications that only need to protect inbound user data, enable input checks alone. For applications that must prevent the model from leaking sensitive information, enable both.
Use Cases
Healthcare Applications
Block or review PHI while masking patient identifiers:
yaml
guardrails:
pii:
enabled: true
enable_phi_detection: true
entity_policies:
PHI_MRN:
action: review
mask: true
PHI_PATIENT_ID:
action: review
mask: true
US_SSN:
action: block
mask: trueFinancial Services
Strict blocking for high-risk financial identifiers:
yaml
guardrails:
pii:
enabled: true
entity_policies:
CREDIT_CARD:
action: block
mask: true
IBAN_CODE:
action: block
mask: trueMonitoring Mode
Detect and log without blocking:
yaml
guardrails:
pii:
enabled: true
default_action: flag
default_mask: false
entity_policies:
EMAIL_ADDRESS:
action: flag
mask: falseBest Practices
- Start with flag and mask — Use non-blocking actions while tuning thresholds before enforcing blocks in production.
- Set entity-specific thresholds — Names and locations often need higher thresholds than structured identifiers like email addresses.
- Enable PHI detection for healthcare — Turn on
enable_phi_detectionwhen handling medical or patient-related content. - Review audit logs regularly — Use logged confidence scores to refine thresholds and policies.
- Combine with existing checks — PII/PHI detection complements Sensitive Data Detection and Toxicity Detection for layered protection.
Next Steps
- Token Budget Enforcement — Limit request and run token usage
- Sensitive Data Detection — Pattern-based sensitive data checks
- Guardrails Configuration — Full configuration reference
- YAML Configuration — Complete configuration examples
