PerplexityGuard¶
Detects adversarial suffixes (GCG, AutoDAN) by analyzing character-level perplexity.
How It Works¶
Adversarial attacks often append gibberish suffixes that have abnormally high perplexity scores:
Normal: "What is the capital of France?"
Attack: "What is the capital of France? describing.]LATEaj sentence surely..."
Usage¶
use oxide_guard::{Guard, PerplexityGuard};
let guard = PerplexityGuard::new("perplexity")
.with_max_perplexity(50000.0) // Block if perplexity exceeds this
.with_min_entropy(1.5); // Block if entropy below this
let result = guard.check(user_input);
if !result.passed {
println!("Adversarial suffix detected: {}", result.reason);
}