Skip to content

Data Exfiltration Detection

Detect and block LLM responses that may leak credentials, bulk sensitive data, or suspicious export-style payloads before they reach users.

Overview

Data exfiltration detection runs on model output only. It scores each response using multiple detectors, then applies one of three actions:

ActionBehavior
allowRisk score below the warn threshold — response passes unchanged
warnRisk score at or above warn threshold — sensitive spans are masked, response still passes
blockRisk score at or above block threshold — response is blocked

Use this guardrail when agents or RAG pipelines can access internal data and you need to prevent accidental credential leaks, bulk PII dumps, or encoded data exports in LLM replies.

How It Works

  1. Each output check runs through the configured detectors.
  2. Detectors contribute a risk score when they find matches.
  3. Scores are summed and compared to action_thresholds.
  4. On warn, matched spans are replaced with the configured mask_token.
  5. On block, the check fails and the response is not returned.
  6. Results appear on GuardrailResult.exfiltration and are persisted when ARMS storage is enabled.

Detectors

Secret Detection

Finds credentials and API keys using regex patterns and, when available, the optional detect-secrets plugin.

Built-in patterns include:

  • AWS access keys (AKIA…)
  • GitHub tokens (ghp_…)
  • OpenAI-style keys (sk-…)
  • Generic API key / secret strings
  • JWTs

Bulk Sensitive Data

Detects large-scale disclosure of structured identifiers such as emails, phone numbers, SSNs, and AWS keys. Triggers when the total match count reaches bulk_sensitive.threshold.

Abnormal Output

Flags responses that look like data exports rather than normal chat replies:

  • Very large responses (word/character limits)
  • CSV-like tabular rows
  • Large JSON-like structures
  • Base64-encoded blocks
  • High-entropy tokens
  • Pipe- or tab-delimited tables

Configuration

Enable Data Exfiltration

yaml
guardrails:
  output_checks: true

  data_exfiltration:
    enabled: true
    output_checks: true

    action_thresholds:
      warn: 20    # Mask sensitive spans at this score
      block: 80   # Block the response at this score

    mask_token: "[REDACTED]"

    detectors:
      secrets: true
      bulk_sensitive: true
      abnormal_patterns: true

    use_detect_secrets_plugin: true

    bulk_sensitive:
      threshold: 20
      score_per_hit: 2
      max_score: 40

    abnormal_patterns:
      max_words: 2500
      max_chars: 20000
      csv_min_commas: 4
      csv_row_threshold: 30
      json_brace_threshold: 40
      json_min_chars: 3000
      base64_min_block_length: 200
      base64_block_count_threshold: 2
      base64_short_block_length: 80
      high_entropy_min_length: 64
      high_entropy_threshold: 4.85

Parameters

OptionTypeDefaultDescription
enabledboolfalseEnable data exfiltration detection
output_checksboolinherits guardrails.output_checksRun on model output
action_thresholds.warnint20Minimum score to mask sensitive spans
action_thresholds.blockint80Minimum score to block the response
mask_tokenstr"[REDACTED]"Replacement text for masked spans
detectors.secretsbooltrueEnable secret/credential detection
detectors.bulk_sensitivebooltrueEnable bulk identifier detection
detectors.abnormal_patternsbooltrueEnable export-style pattern detection
use_detect_secrets_pluginbooltrueUse detect-secrets when installed (falls back to regex)
bulk_sensitive.thresholdint20Minimum total matches to trigger bulk detector
bulk_sensitive.score_per_hitint2Score added per bulk match
bulk_sensitive.max_scoreint40Maximum score from bulk detector
abnormal_patterns.*varioussee YAML aboveThresholds for export-style heuristics

Usage

With GuardrailSystem

python
from elsai_guardrails.guardrails import GuardrailSystem, GuardrailConfig
from elsai_guardrails.guardrails.guardrail_policy import GuardrailPolicy

policy = GuardrailPolicy.from_yaml("config.yml")
guardrail = GuardrailSystem(
    config=GuardrailConfig(
        check_toxicity=False,
        check_sensitive_data=False,
        check_semantic=False,
    ),
    output_checks=True,
    guardrail_policy=policy,
)

result = guardrail.check_output("Here is the key: ghp_" + "x" * 36)

if not result.passed:
    print(result.message)
    print(result.exfiltration)
elif result.exfiltration and result.exfiltration["action"] == "warn":
    print("Masked text:", result.exfiltration["processed_text"])

With LLMRails

When data_exfiltration.enabled: true and output_checks: true, LLMRails.generate() automatically runs exfiltration checks on the model response:

python
from elsai_guardrails.guardrails import LLMRails

rails = LLMRails.from_config("config.yml")
response = rails.generate(
    messages=[{"role": "user", "content": "Summarize the customer export"}],
    return_details=True,
)

if response.output_result and response.output_result.exfiltration:
    print(response.output_result.exfiltration)

Result Structure

The exfiltration field on GuardrailResult contains:

python
{
    "passed": True,           # False when action is "block"
    "action": "warn",         # "allow", "warn", or "block"
    "risk_score": 25,
    "findings": [
        {
            "name": "Secret Detection",
            "detected": True,
            "score": 15,
            "details": [...]
        }
    ],
    "processed_text": "...",  # Masked text when action is "warn"
    "message": "Sensitive content masked in response (risk score 25)."
}

Tuning Guidelines

Strict mode — lower thresholds to catch more leaks:

yaml
data_exfiltration:
  enabled: true
  action_thresholds:
    warn: 10
    block: 40
  bulk_sensitive:
    threshold: 5

Permissive mode — raise thresholds for chat-heavy apps with long replies:

yaml
data_exfiltration:
  enabled: true
  action_thresholds:
    warn: 30
    block: 100
  detectors:
    abnormal_patterns: false

Optional Dependency

The detect-secrets package is optional. When installed, secret detection uses both regex patterns and the plugin scanner. When not installed, regex-only detection still runs.

Examples

See Basic Examples and Integration Examples for complete code samples.

python
from elsai_guardrails.guardrails import GuardrailConfig, GuardrailSystem
from elsai_guardrails.guardrails.guardrail_policy import GuardrailPolicy

guardrails = GuardrailSystem(
    config=GuardrailConfig(check_toxicity=False, check_sensitive_data=False, check_semantic=False),
    output_checks=True,
    guardrail_policy=GuardrailPolicy.from_file("config.yaml"),
)

result = guardrails.check_output("Here is the key: ghp_" + "x" * 36)
print(result.passed, result.exfiltration)

Use Cases

  • RAG / internal knowledge assistants — prevent the model from dumping large record sets or credentials from retrieved context
  • Agent workflows with tool access — block responses that echo API keys or database exports
  • Compliance-sensitive deployments — mask or block bulk PII before it leaves the application boundary

Next Steps

Released under the MIT License.