PerplexityGuard¶
Detects adversarial suffixes (GCG, AutoDAN) by analyzing character-level perplexity.
How It Works¶
Adversarial attacks often append gibberish suffixes that have abnormally high perplexity scores:
Normal: "What is the capital of France?"
Attack: "What is the capital of France? describing.]LATEaj sentence surely..."
Usage¶
use oxideshield_guard::{Guard, PerplexityGuard};
let guard = PerplexityGuard::new("perplexity")
.with_max_perplexity(max_perplexity) // Block if perplexity exceeds this (configurable)
.with_min_entropy(min_entropy); // Block if entropy below this (configurable)
let result = guard.check(user_input);
if !result.passed {
println!("Adversarial suffix detected: {}", result.reason);
}
from oxideshield import perplexity_guard, PerplexityGuard
# Uses default thresholds (configurable per-deployment)
guard = perplexity_guard()
result = guard.check(user_input)
# Custom thresholds (use class directly)
guard = PerplexityGuard("perplexity", max_perplexity=your_threshold, min_entropy=your_entropy)