Data Exfiltration Detection

Detect and block LLM responses that may leak credentials, bulk sensitive data, or suspicious export-style payloads before they reach users.

Overview

Data exfiltration detection runs on model output only. It scores each response using multiple detectors, then applies one of three actions:

Action	Behavior
`allow`	Risk score below the warn threshold — response passes unchanged
`warn`	Risk score at or above warn threshold — sensitive spans are masked, response still passes
`block`	Risk score at or above block threshold — response is blocked

Use this guardrail when agents or RAG pipelines can access internal data and you need to prevent accidental credential leaks, bulk PII dumps, or encoded data exports in LLM replies.

How It Works

Each output check runs through the configured detectors.
Detectors contribute a risk score when they find matches.
Scores are summed and compared to action_thresholds.
On warn, matched spans are replaced with the configured mask_token.
On block, the check fails and the response is not returned.
Results appear on GuardrailResult.exfiltration and are persisted when ARMS storage is enabled.

Detectors

Secret Detection

Finds credentials and API keys using regex patterns and, when available, the optional detect-secrets plugin.

Built-in patterns include:

AWS access keys (AKIA…)
GitHub tokens (ghp_…)
OpenAI-style keys (sk-…)
Generic API key / secret strings
JWTs

Bulk Sensitive Data

Detects large-scale disclosure of structured identifiers such as emails, phone numbers, SSNs, and AWS keys. Triggers when the total match count reaches bulk_sensitive.threshold.

Abnormal Output

Flags responses that look like data exports rather than normal chat replies:

Very large responses (word/character limits)
CSV-like tabular rows
Large JSON-like structures
Base64-encoded blocks
High-entropy tokens
Pipe- or tab-delimited tables

Configuration

Enable Data Exfiltration

yaml

guardrails:
  output_checks: true

  data_exfiltration:
    enabled: true
    output_checks: true

    action_thresholds:
      warn: 20    # Mask sensitive spans at this score
      block: 80   # Block the response at this score

    mask_token: "[REDACTED]"

    detectors:
      secrets: true
      bulk_sensitive: true
      abnormal_patterns: true

    use_detect_secrets_plugin: true

    bulk_sensitive:
      threshold: 20
      score_per_hit: 2
      max_score: 40

    abnormal_patterns:
      max_words: 2500
      max_chars: 20000
      csv_min_commas: 4
      csv_row_threshold: 30
      json_brace_threshold: 40
      json_min_chars: 3000
      base64_min_block_length: 200
      base64_block_count_threshold: 2
      base64_short_block_length: 80
      high_entropy_min_length: 64
      high_entropy_threshold: 4.85

Parameters

Option	Type	Default	Description
`enabled`	bool	`false`	Enable data exfiltration detection
`output_checks`	bool	inherits `guardrails.output_checks`	Run on model output
`action_thresholds.warn`	int	`20`	Minimum score to mask sensitive spans
`action_thresholds.block`	int	`80`	Minimum score to block the response
`mask_token`	str	`"[REDACTED]"`	Replacement text for masked spans
`detectors.secrets`	bool	`true`	Enable secret/credential detection
`detectors.bulk_sensitive`	bool	`true`	Enable bulk identifier detection
`detectors.abnormal_patterns`	bool	`true`	Enable export-style pattern detection
`use_detect_secrets_plugin`	bool	`true`	Use `detect-secrets` when installed (falls back to regex)
`bulk_sensitive.threshold`	int	`20`	Minimum total matches to trigger bulk detector
`bulk_sensitive.score_per_hit`	int	`2`	Score added per bulk match
`bulk_sensitive.max_score`	int	`40`	Maximum score from bulk detector
`abnormal_patterns.*`	various	see YAML above	Thresholds for export-style heuristics

Usage

With GuardrailSystem

python

from elsai_guardrails.guardrails import GuardrailSystem, GuardrailConfig
from elsai_guardrails.guardrails.guardrail_policy import GuardrailPolicy

policy = GuardrailPolicy.from_yaml("config.yml")
guardrail = GuardrailSystem(
    config=GuardrailConfig(
        check_toxicity=False,
        check_sensitive_data=False,
        check_semantic=False,
    ),
    output_checks=True,
    guardrail_policy=policy,
)

result = guardrail.check_output("Here is the key: ghp_" + "x" * 36)

if not result.passed:
    print(result.message)
    print(result.exfiltration)
elif result.exfiltration and result.exfiltration["action"] == "warn":
    print("Masked text:", result.exfiltration["processed_text"])

With LLMRails

When data_exfiltration.enabled: true and output_checks: true, LLMRails.generate() automatically runs exfiltration checks on the model response:

python

from elsai_guardrails.guardrails import LLMRails

rails = LLMRails.from_config("config.yml")
response = rails.generate(
    messages=[{"role": "user", "content": "Summarize the customer export"}],
    return_details=True,
)

if response.output_result and response.output_result.exfiltration:
    print(response.output_result.exfiltration)

Result Structure

The exfiltration field on GuardrailResult contains:

python

{
    "passed": True,           # False when action is "block"
    "action": "warn",         # "allow", "warn", or "block"
    "risk_score": 25,
    "findings": [
        {
            "name": "Secret Detection",
            "detected": True,
            "score": 15,
            "details": [...]
        }
    ],
    "processed_text": "...",  # Masked text when action is "warn"
    "message": "Sensitive content masked in response (risk score 25)."
}

Tuning Guidelines

Strict mode — lower thresholds to catch more leaks:

yaml

data_exfiltration:
  enabled: true
  action_thresholds:
    warn: 10
    block: 40
  bulk_sensitive:
    threshold: 5

Permissive mode — raise thresholds for chat-heavy apps with long replies:

yaml

data_exfiltration:
  enabled: true
  action_thresholds:
    warn: 30
    block: 100
  detectors:
    abnormal_patterns: false

Optional Dependency

The detect-secrets package is optional. When installed, secret detection uses both regex patterns and the plugin scanner. When not installed, regex-only detection still runs.

Examples

See Basic Examples and Integration Examples for complete code samples.

python

from elsai_guardrails.guardrails import GuardrailConfig, GuardrailSystem
from elsai_guardrails.guardrails.guardrail_policy import GuardrailPolicy

guardrails = GuardrailSystem(
    config=GuardrailConfig(check_toxicity=False, check_sensitive_data=False, check_semantic=False),
    output_checks=True,
    guardrail_policy=GuardrailPolicy.from_file("config.yaml"),
)

result = guardrails.check_output("Here is the key: ghp_" + "x" * 36)
print(result.passed, result.exfiltration)

Use Cases

RAG / internal knowledge assistants — prevent the model from dumping large record sets or credentials from retrieved context
Agent workflows with tool access — block responses that echo API keys or database exports
Compliance-sensitive deployments — mask or block bulk PII before it leaves the application boundary

Next Steps

Output Rails — how output validation fits into the pipeline
ARMS Storage — persist exfiltration check results with guardrail runs
Guardrails Configuration — full policy reference

Data Exfiltration Detection ​

Overview ​

How It Works ​

Detectors ​

Secret Detection ​

Bulk Sensitive Data ​

Abnormal Output ​

Configuration ​

Enable Data Exfiltration ​

Parameters ​

Usage ​

With GuardrailSystem ​

With LLMRails ​

Result Structure ​

Tuning Guidelines ​

Optional Dependency ​

Examples ​

Use Cases ​

Next Steps ​

Data Exfiltration Detection

Overview

How It Works

Detectors

Secret Detection

Bulk Sensitive Data

Abnormal Output

Configuration

Enable Data Exfiltration

Parameters

Usage

With GuardrailSystem

With LLMRails

Result Structure

Tuning Guidelines

Optional Dependency

Examples

Use Cases

Next Steps