HallucinationGuard¶

HallucinationGuard compares LLM outputs against source context to detect unsupported or contradicted claims using Natural Language Inference (NLI).

Overview¶

Property	Value
Latency	20-100ms
Memory	~350 MB (NLI model)
Async	Yes
ML Required	Yes
License	Professional

Algorithm¶

Split output into individual claims (sentences)
Split context into context sentences
For each claim, run NLI against every context sentence (O(n*m))
Per claim: take max entailment and max contradiction across context sentences
Classify each claim as Supported, Contradicted, or Unsupported
Aggregate into a hallucination score

Claim Verdicts¶

Verdict	Description
Supported	Claim is entailed by context evidence
Contradicted	Claim contradicts context evidence
Unsupported	Claim has no supporting context

Usage¶

Rust¶

use oxide_hallucination::HallucinationGuard;
use oxideshield_guard::Guard;

let guard = HallucinationGuard::new("hallucination", nli_classifier)
    .with_threshold(0.7);

let result = guard.check_with_context(
    "The capital of France is Berlin",
    "France is a country in Europe. Its capital is Paris."
);
assert!(!result.passed);

Python¶

from oxideshield import hallucination_guard

guard = hallucination_guard(threshold=0.7)
result = guard.check_with_context(
    output="The capital of France is Berlin",
    context="France is a country in Europe. Its capital is Paris."
)
assert not result.passed

Configuration¶

guards:
  - type: hallucination
    threshold: 0.7
    action: block

Research References¶

Manakul et al., SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative LLMs (2023)
He et al., DeBERTa: Decoding-enhanced BERT with Disentangled Attention (2021)