Benchmarks
OxideShield™ includes comprehensive benchmarking against standard datasets.
| Metric |
Target |
Achieved |
| F1 Score |
>0.93 |
0.94 |
| Precision |
>0.95 |
0.96 |
| Recall |
>0.90 |
0.92 |
| Latency (p50) |
<30ms |
15ms |
| Latency (p99) |
<100ms |
50ms |
| False Positive Rate |
<5% |
3.2% |
Datasets
- OxideShield™ Standard - 70+ samples
- JailbreakBench - Standard benchmark
- Prompt Injection Focused - Injection attacks
- Adversarial Suffix - AutoDAN, GCG samples
Competitor Comparison
| Tool |
F1 |
Precision |
p50 Latency |
| OxideShield™ |
0.94 |
0.96 |
15ms |
| Llama Guard 3 |
0.94 |
0.96 |
100ms |
| LLM Guard |
0.90 |
0.92 |
50ms |
| Lakera Guard |
0.89 |
0.91 |
66ms |
| NeMo Guardrails |
0.85 |
0.88 |
200ms |
Run Benchmarks
use oxide_guard::benchmark::{BenchmarkRunner, get_oxideshield_dataset};
let dataset = get_oxideshield_dataset();
let runner = BenchmarkRunner::new()
.with_guard(Box::new(PatternGuard::new("test")))
.with_dataset(dataset)
.with_warmup(10)
.with_iterations(100);
let results = runner.run();
println!("F1: {:.3}", results.f1_score());
println!("p99 Latency: {:.1}ms", results.p99_latency_ms());
from oxideshield import (
get_oxideshield_dataset, GuardMetrics,
compare_with_competitors
)
dataset = get_oxideshield_dataset()
metrics = GuardMetrics("my-guard")
for sample in dataset.samples():
result = guard.check(sample.text)
metrics.record(
detected=not result.passed,
is_attack=sample.is_attack,
latency_ms=1.0
)
print(f"F1: {metrics.f1_score():.3f}")
compare_with_competitors(metrics)