Appearance
Data Exfiltration Detection
Detect and block LLM responses that may leak credentials, bulk sensitive data, or suspicious export-style payloads before they reach users.
Overview
Data exfiltration detection runs on model output only. It scores each response using multiple detectors, then applies one of three actions:
| Action | Behavior |
|---|---|
allow | Risk score below the warn threshold — response passes unchanged |
warn | Risk score at or above warn threshold — sensitive spans are masked, response still passes |
block | Risk score at or above block threshold — response is blocked |
Use this guardrail when agents or RAG pipelines can access internal data and you need to prevent accidental credential leaks, bulk PII dumps, or encoded data exports in LLM replies.
How It Works
- Each output check runs through the configured detectors.
- Detectors contribute a risk score when they find matches.
- Scores are summed and compared to
action_thresholds. - On warn, matched spans are replaced with the configured
mask_token. - On block, the check fails and the response is not returned.
- Results appear on
GuardrailResult.exfiltrationand are persisted when ARMS storage is enabled.
Detectors
Secret Detection
Finds credentials and API keys using regex patterns and, when available, the optional detect-secrets plugin.
Built-in patterns include:
- AWS access keys (
AKIA…) - GitHub tokens (
ghp_…) - OpenAI-style keys (
sk-…) - Generic API key / secret strings
- JWTs
Bulk Sensitive Data
Detects large-scale disclosure of structured identifiers such as emails, phone numbers, SSNs, and AWS keys. Triggers when the total match count reaches bulk_sensitive.threshold.
Abnormal Output
Flags responses that look like data exports rather than normal chat replies:
- Very large responses (word/character limits)
- CSV-like tabular rows
- Large JSON-like structures
- Base64-encoded blocks
- High-entropy tokens
- Pipe- or tab-delimited tables
Configuration
Enable Data Exfiltration
yaml
guardrails:
output_checks: true
data_exfiltration:
enabled: true
output_checks: true
action_thresholds:
warn: 20 # Mask sensitive spans at this score
block: 80 # Block the response at this score
mask_token: "[REDACTED]"
detectors:
secrets: true
bulk_sensitive: true
abnormal_patterns: true
use_detect_secrets_plugin: true
bulk_sensitive:
threshold: 20
score_per_hit: 2
max_score: 40
abnormal_patterns:
max_words: 2500
max_chars: 20000
csv_min_commas: 4
csv_row_threshold: 30
json_brace_threshold: 40
json_min_chars: 3000
base64_min_block_length: 200
base64_block_count_threshold: 2
base64_short_block_length: 80
high_entropy_min_length: 64
high_entropy_threshold: 4.85Parameters
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable data exfiltration detection |
output_checks | bool | inherits guardrails.output_checks | Run on model output |
action_thresholds.warn | int | 20 | Minimum score to mask sensitive spans |
action_thresholds.block | int | 80 | Minimum score to block the response |
mask_token | str | "[REDACTED]" | Replacement text for masked spans |
detectors.secrets | bool | true | Enable secret/credential detection |
detectors.bulk_sensitive | bool | true | Enable bulk identifier detection |
detectors.abnormal_patterns | bool | true | Enable export-style pattern detection |
use_detect_secrets_plugin | bool | true | Use detect-secrets when installed (falls back to regex) |
bulk_sensitive.threshold | int | 20 | Minimum total matches to trigger bulk detector |
bulk_sensitive.score_per_hit | int | 2 | Score added per bulk match |
bulk_sensitive.max_score | int | 40 | Maximum score from bulk detector |
abnormal_patterns.* | various | see YAML above | Thresholds for export-style heuristics |
Usage
With GuardrailSystem
python
from elsai_guardrails.guardrails import GuardrailSystem, GuardrailConfig
from elsai_guardrails.guardrails.guardrail_policy import GuardrailPolicy
policy = GuardrailPolicy.from_yaml("config.yml")
guardrail = GuardrailSystem(
config=GuardrailConfig(
check_toxicity=False,
check_sensitive_data=False,
check_semantic=False,
),
output_checks=True,
guardrail_policy=policy,
)
result = guardrail.check_output("Here is the key: ghp_" + "x" * 36)
if not result.passed:
print(result.message)
print(result.exfiltration)
elif result.exfiltration and result.exfiltration["action"] == "warn":
print("Masked text:", result.exfiltration["processed_text"])With LLMRails
When data_exfiltration.enabled: true and output_checks: true, LLMRails.generate() automatically runs exfiltration checks on the model response:
python
from elsai_guardrails.guardrails import LLMRails
rails = LLMRails.from_config("config.yml")
response = rails.generate(
messages=[{"role": "user", "content": "Summarize the customer export"}],
return_details=True,
)
if response.output_result and response.output_result.exfiltration:
print(response.output_result.exfiltration)Result Structure
The exfiltration field on GuardrailResult contains:
python
{
"passed": True, # False when action is "block"
"action": "warn", # "allow", "warn", or "block"
"risk_score": 25,
"findings": [
{
"name": "Secret Detection",
"detected": True,
"score": 15,
"details": [...]
}
],
"processed_text": "...", # Masked text when action is "warn"
"message": "Sensitive content masked in response (risk score 25)."
}Tuning Guidelines
Strict mode — lower thresholds to catch more leaks:
yaml
data_exfiltration:
enabled: true
action_thresholds:
warn: 10
block: 40
bulk_sensitive:
threshold: 5Permissive mode — raise thresholds for chat-heavy apps with long replies:
yaml
data_exfiltration:
enabled: true
action_thresholds:
warn: 30
block: 100
detectors:
abnormal_patterns: falseOptional Dependency
The detect-secrets package is optional. When installed, secret detection uses both regex patterns and the plugin scanner. When not installed, regex-only detection still runs.
Examples
See Basic Examples and Integration Examples for complete code samples.
python
from elsai_guardrails.guardrails import GuardrailConfig, GuardrailSystem
from elsai_guardrails.guardrails.guardrail_policy import GuardrailPolicy
guardrails = GuardrailSystem(
config=GuardrailConfig(check_toxicity=False, check_sensitive_data=False, check_semantic=False),
output_checks=True,
guardrail_policy=GuardrailPolicy.from_file("config.yaml"),
)
result = guardrails.check_output("Here is the key: ghp_" + "x" * 36)
print(result.passed, result.exfiltration)Use Cases
- RAG / internal knowledge assistants — prevent the model from dumping large record sets or credentials from retrieved context
- Agent workflows with tool access — block responses that echo API keys or database exports
- Compliance-sensitive deployments — mask or block bulk PII before it leaves the application boundary
Next Steps
- Output Rails — how output validation fits into the pipeline
- ARMS Storage — persist exfiltration check results with guardrail runs
- Guardrails Configuration — full policy reference
