Skip to content

Benchmarks

OxideShield™ includes comprehensive benchmarking against standard datasets.

Performance Targets

Metric Target Achieved
F1 Score >0.93 0.94
Precision >0.95 0.96
Recall >0.90 0.92
Latency (p50) <30ms 15ms
Latency (p99) <100ms 50ms
False Positive Rate <5% 3.2%

Datasets

  • OxideShield™ Standard - 70+ samples
  • JailbreakBench - Standard benchmark
  • Prompt Injection Focused - Injection attacks
  • Adversarial Suffix - AutoDAN, GCG samples

Competitor Comparison

Tool F1 Precision p50 Latency
OxideShield™ 0.94 0.96 15ms
Llama Guard 3 0.94 0.96 100ms
LLM Guard 0.90 0.92 50ms
Lakera Guard 0.89 0.91 66ms
NeMo Guardrails 0.85 0.88 200ms

Run Benchmarks

use oxide_guard::benchmark::{BenchmarkRunner, get_oxideshield_dataset};

let dataset = get_oxideshield_dataset();
let runner = BenchmarkRunner::new()
    .with_guard(Box::new(PatternGuard::new("test")))
    .with_dataset(dataset)
    .with_warmup(10)
    .with_iterations(100);

let results = runner.run();
println!("F1: {:.3}", results.f1_score());
println!("p99 Latency: {:.1}ms", results.p99_latency_ms());
from oxideshield import (
    get_oxideshield_dataset, GuardMetrics, 
    compare_with_competitors
)

dataset = get_oxideshield_dataset()
metrics = GuardMetrics("my-guard")

for sample in dataset.samples():
    result = guard.check(sample.text)
    metrics.record(
        detected=not result.passed,
        is_attack=sample.is_attack,
        latency_ms=1.0
    )

print(f"F1: {metrics.f1_score():.3f}")
compare_with_competitors(metrics)