Skip to content

PerplexityGuard

Detects adversarial suffixes (GCG, AutoDAN) by analyzing character-level perplexity.

How It Works

Adversarial attacks often append gibberish suffixes that have abnormally high perplexity scores:

Normal: "What is the capital of France?"
Attack: "What is the capital of France? describing.]LATEaj sentence surely..."

Usage

use oxide_guard::{Guard, PerplexityGuard};

let guard = PerplexityGuard::new("perplexity")
    .with_max_perplexity(50000.0)  // Block if perplexity exceeds this
    .with_min_entropy(1.5);        // Block if entropy below this

let result = guard.check(user_input);
if !result.passed {
    println!("Adversarial suffix detected: {}", result.reason);
}
from oxideshield import perplexity_guard, PerplexityGuard

# Default: max_perplexity=50000.0, min_entropy=1.5
guard = perplexity_guard()
result = guard.check(user_input)

# Custom thresholds (use class directly)
guard = PerplexityGuard("perplexity", max_perplexity=30000.0, min_entropy=2.0)