Skip to content

PerplexityGuard

Detects adversarial suffixes (GCG, AutoDAN) by analyzing character-level perplexity.

How It Works

Adversarial attacks often append gibberish suffixes that have abnormally high perplexity scores:

Normal: "What is the capital of France?"
Attack: "What is the capital of France? describing.]LATEaj sentence surely..."

Usage

use oxideshield_guard::{Guard, PerplexityGuard};

let guard = PerplexityGuard::new("perplexity")
    .with_max_perplexity(max_perplexity)  // Block if perplexity exceeds this (configurable)
    .with_min_entropy(min_entropy);       // Block if entropy below this (configurable)

let result = guard.check(user_input);
if !result.passed {
    println!("Adversarial suffix detected: {}", result.reason);
}
from oxideshield import perplexity_guard, PerplexityGuard

# Uses default thresholds (configurable per-deployment)
guard = perplexity_guard()
result = guard.check(user_input)

# Custom thresholds (use class directly)
guard = PerplexityGuard("perplexity", max_perplexity=your_threshold, min_entropy=your_entropy)