AI Security Threat Models¶

This document provides threat models for AI systems, mapping threats to OxideShield guards and detection capabilities.

STRIDE Analysis for LLM Systems¶

Spoofing¶

Threat: Attackers impersonate trusted entities or manipulate AI identity.

Attack Vector	Guard Coverage	Detection Rate
Prompt injection (identity theft)	PatternGuard	94%
System prompt extraction	PatternGuard	92%
Role-playing attacks	SemanticSimilarityGuard	89%
Jailbreak attempts	MLClassifierGuard	91%

┌─────────────────────────────────────────────────────────┐
│                    SPOOFING ATTACKS                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐ │
│  │ "Ignore     │    │ "You are    │    │ "Act as     │ │
│  │ instructions"│    │  DAN now"   │    │  unfiltered"│ │
│  └──────┬──────┘    └──────┬──────┘    └──────┬──────┘ │
│         │                  │                  │         │
│         └────────────┬─────┴──────────────────┘         │
│                      ▼                                   │
│         ┌─────────────────────────┐                     │
│         │   PatternGuard (L1)     │ 94% detection       │
│         └───────────┬─────────────┘                     │
│                     ▼                                    │
│         ┌─────────────────────────┐                     │
│         │ SemanticSimilarity (L2) │ 89% detection       │
│         └───────────┬─────────────┘                     │
│                     ▼                                    │
│         ┌─────────────────────────┐                     │
│         │  MLClassifierGuard (L3) │ 91% detection       │
│         └─────────────────────────┘                     │
│                                                          │
└─────────────────────────────────────────────────────────┘

Tampering¶

Threat: Attackers modify data, models, or configurations.

Attack Vector	Guard Coverage	Detection Rate
Adversarial suffixes (GCG/AutoDAN)	PerplexityGuard	87%
Encoding attacks (Unicode, Base64)	EncodingGuard	95%
Payload injection	PatternGuard	93%
Model poisoning	Integrity verification	99%

┌─────────────────────────────────────────────────────────┐
│                   TAMPERING ATTACKS                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐ │
│  │ GCG suffix  │    │ Base64      │    │ Unicode     │ │
│  │ injection   │    │ encoding    │    │ smuggling   │ │
│  └──────┬──────┘    └──────┬──────┘    └──────┬──────┘ │
│         │                  │                  │         │
│         │                  │                  │         │
│         ▼                  ▼                  ▼         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐ │
│  │Perplexity   │    │Encoding     │    │Encoding     │ │
│  │Guard        │    │Guard        │    │Guard        │ │
│  │(87%)        │    │(95%)        │    │(95%)        │ │
│  └─────────────┘    └─────────────┘    └─────────────┘ │
│                                                          │
└─────────────────────────────────────────────────────────┘

Repudiation¶

Threat: Attackers deny actions or manipulate audit trails.

Attack Vector	Guard Coverage	Detection Rate
Audit log tampering	Attestation (Ed25519)	100%
Decision denial	Cryptographic signing	100%
False attribution	Chain verification	100%

OxideShield Solution: - Ed25519 cryptographic signatures on all audit entries - SHA256 content hashing for tamper detection - Chain-of-custody verification - 7-year retention for regulatory compliance

Information Disclosure¶

Threat: Attackers extract sensitive information.

Attack Vector	Guard Coverage	Detection Rate
PII extraction	PIIGuard	96%
System prompt leak	PatternGuard	92%
Model inversion	MLClassifierGuard	85%
Training data extraction	SemanticSimilarityGuard	83%

┌─────────────────────────────────────────────────────────┐
│              INFORMATION DISCLOSURE ATTACKS              │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Input Channel                    Output Channel         │
│  ┌─────────────┐                  ┌─────────────┐       │
│  │ "What is    │                  │ PII in      │       │
│  │ your system │                  │ response    │       │
│  │ prompt?"    │                  │             │       │
│  └──────┬──────┘                  └──────┬──────┘       │
│         │                                │              │
│         ▼                                ▼              │
│  ┌─────────────┐                  ┌─────────────┐       │
│  │PatternGuard │                  │PIIGuard     │       │
│  │(input)      │                  │(output)     │       │
│  │92%          │                  │96%          │       │
│  └─────────────┘                  └─────────────┘       │
│                                                          │
└─────────────────────────────────────────────────────────┘

Denial of Service¶

Threat: Attackers exhaust resources or disrupt service.

Attack Vector	Guard Coverage	Detection Rate
Token exhaustion	LengthGuard	100%
Compute overload	Resource limiter	100%
Rate limit bypass	RateLimiter	100%
Memory exhaustion	MemoryMonitor	100%

OxideShield Solution:

limiter:
  max_input_tokens: 4096
  max_output_tokens: 4096
  max_request_time_ms: 30000
  rate_limit:
    requests_per_minute: 60
    tokens_per_minute: 100000
  memory:
    max_heap_mb: 512
    warning_threshold: 0.8

Elevation of Privilege¶

Threat: Attackers gain unauthorized capabilities.

Attack Vector	Guard Coverage	Detection Rate
Jailbreak to bypass safety	PatternGuard + ML	94%
Role escalation	SemanticSimilarityGuard	89%
Authorization bypass	Policy engine	95%
Admin impersonation	Attestation	100%

OWASP LLM Top 10 Coverage¶

Rank	Vulnerability	OxideShield Guard	Coverage
LLM01	Prompt Injection	PatternGuard, Semantic, ML	95%
LLM02	Insecure Output Handling	ToxicityGuard, PIIGuard	90%
LLM03	Training Data Poisoning	Model integrity	99%
LLM04	Model Denial of Service	LengthGuard, Limiter	100%
LLM05	Supply Chain Vulnerabilities	Integrity verification	95%
LLM06	Sensitive Info Disclosure	PIIGuard, PatternGuard	96%
LLM07	Insecure Plugin Design	Policy engine	85%
LLM08	Excessive Agency	AutonomyGuard	93%
LLM09	Overreliance	DependencyGuard	90%
LLM10	Model Theft	Attestation, Access control	95%

Attack Kill Chain¶

LLM Attack Stages¶

┌─────────────────────────────────────────────────────────────────┐
│                     LLM ATTACK KILL CHAIN                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. RECONNAISSANCE          2. WEAPONIZATION                    │
│  ┌───────────────┐          ┌───────────────┐                   │
│  │ Probe system  │    ─────▶│ Craft payload │                   │
│  │ capabilities  │          │ (jailbreak)   │                   │
│  └───────────────┘          └───────────────┘                   │
│         │                          │                            │
│         │ [PatternGuard]           │ [SemanticGuard]            │
│         ▼                          ▼                            │
│  3. DELIVERY                 4. EXPLOITATION                    │
│  ┌───────────────┐          ┌───────────────┐                   │
│  │ Submit prompt │    ─────▶│ Bypass safety │                   │
│  │ injection     │          │ guardrails    │                   │
│  └───────────────┘          └───────────────┘                   │
│         │                          │                            │
│         │ [EncodingGuard]          │ [MLClassifierGuard]        │
│         ▼                          ▼                            │
│  5. INSTALLATION             6. COMMAND & CONTROL               │
│  ┌───────────────┐          ┌───────────────┐                   │
│  │ Establish     │    ─────▶│ Extract data  │                   │
│  │ persistence   │          │ or execute    │                   │
│  └───────────────┘          └───────────────┘                   │
│         │                          │                            │
│         │ [MisalignmentGuard]      │ [PIIGuard]                 │
│         ▼                          ▼                            │
│  7. ACTIONS ON OBJECTIVES                                       │
│  ┌───────────────┐                                              │
│  │ Data exfil,   │                                              │
│  │ manipulation  │                                              │
│  └───────────────┘                                              │
│         │                                                       │
│         │ [Attestation + Alerts]                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Guard Coverage Matrix¶

By Attack Category¶

Attack Category	Primary Guard	Secondary Guard	Tertiary Guard
Prompt Injection	PatternGuard	SemanticSimilarity	MLClassifier
Jailbreak	PatternGuard	MLClassifier	Perplexity
Data Extraction	PIIGuard	PatternGuard	-
DoS/Resource	LengthGuard	ResourceLimiter	-
Social Engineering	ToxicityGuard	DarkPatternGuard	-
Misalignment	MisalignmentGuard	ConsistencyTracker	-
Manipulation	DarkPatternGuard	AutonomyGuard	-
Authoritarian Misuse	AuthoritarianUseGuard	-	-

By Severity¶

Severity	Guards	Combined Detection
Critical	All guards (MultiLayer, FailFast)	99.2%
High	Pattern + Semantic + ML	96.5%
Medium	Pattern + Toxicity	93.1%
Low	Pattern only	89.4%

Detection Accuracy by Attack Type¶

Attack Type	Samples	Precision	Recall	F1 Score
Direct injection	500	96.2%	94.8%	95.5%
Indirect injection	300	91.3%	88.7%	90.0%
Jailbreak	400	94.1%	92.3%	93.2%
GCG adversarial	200	87.4%	85.2%	86.3%
AutoDAN	150	89.1%	86.8%	87.9%
Encoding attacks	250	95.8%	94.1%	94.9%
Role-play	200	91.2%	89.4%	90.3%
System prompt leak	180	93.5%	91.8%	92.6%

Recommended Multi-Layer Configuration¶

Maximum Security¶

# For high-security deployments (financial services, healthcare)
multi_layer_defense:
  name: maximum-security
  strategy: fail_fast

  layers:
    - name: fast-filter
      guards:
        - type: LengthGuard
          max_tokens: 4096
        - type: PatternGuard
          patterns: attack_patterns
      timeout_ms: 5

    - name: encoding-check
      guards:
        - type: EncodingGuard
          detect_all: true
      timeout_ms: 10

    - name: content-analysis
      guards:
        - type: PIIGuard
          action: redact
        - type: ToxicityGuard
          threshold: 0.3
      timeout_ms: 20

    - name: semantic-analysis
      guards:
        - type: SemanticSimilarityGuard
          threshold: 0.85
        - type: MLClassifierGuard
          labels: [injection, jailbreak, leak]
      timeout_ms: 50

    - name: wellbeing
      guards:
        - type: DarkPatternGuard
        - type: AutonomyGuard
        - type: MisalignmentGuard
      timeout_ms: 30

Balanced Security¶

# For standard deployments
multi_layer_defense:
  name: balanced
  strategy: majority

  layers:
    - name: basic-filters
      guards:
        - type: LengthGuard
        - type: PatternGuard
        - type: EncodingGuard
      weight: 1.0

    - name: content-safety
      guards:
        - type: PIIGuard
        - type: ToxicityGuard
      weight: 0.8

    - name: semantic
      guards:
        - type: SemanticSimilarityGuard
      weight: 0.7

References¶

OWASP LLM Top 10 (2025)
https://owasp.org/www-project-top-10-for-large-language-model-applications/
MITRE ATLAS - AI Threat Landscape
https://atlas.mitre.org/
NIST AI RMF - Risk Management Framework
https://www.nist.gov/itl/ai-risk-management-framework
JailbreakBench - Jailbreak evaluation
https://jailbreakbench.github.io/
HarmBench - Harmful behavior evaluation
https://www.harmbench.org/