Benchmarks¶

OxideShield™ includes comprehensive benchmarking against standard datasets.

Performance Targets¶

Metric	Target	Achieved
F1 Score	>0.93	0.94
Precision	>0.95	0.96
Recall	>0.90	0.92
Latency (p50)	<30ms	15ms
Latency (p99)	<100ms	50ms
False Positive Rate	<5%	3.2%

Datasets¶

OxideShield™ Standard - 70+ samples
JailbreakBench - Standard benchmark
Prompt Injection Focused - Injection attacks
Adversarial Suffix - AutoDAN, GCG samples

Competitor Comparison¶

Tool	F1	Precision	p50 Latency
OxideShield™	0.94	0.96	15ms
Llama Guard 3	0.94	0.96	100ms
LLM Guard	0.90	0.92	50ms
Lakera Guard	0.89	0.91	66ms
NeMo Guardrails	0.85	0.88	200ms

Run Benchmarks¶

use oxide_guard::benchmark::{BenchmarkRunner, get_oxideshield_dataset};

let dataset = get_oxideshield_dataset();
let runner = BenchmarkRunner::new()
    .with_guard(Box::new(PatternGuard::new("test")))
    .with_dataset(dataset)
    .with_warmup(10)
    .with_iterations(100);

let results = runner.run();
println!("F1: {:.3}", results.f1_score());
println!("p99 Latency: {:.1}ms", results.p99_latency_ms());

from oxideshield import (
    get_oxideshield_dataset, GuardMetrics, 
    compare_with_competitors
)

dataset = get_oxideshield_dataset()
metrics = GuardMetrics("my-guard")

for sample in dataset.samples():
    result = guard.check(sample.text)
    metrics.record(
        detected=not result.passed,
        is_attack=sample.is_attack,
        latency_ms=1.0
    )

print(f"F1: {metrics.f1_score():.3f}")
compare_with_competitors(metrics)