Red Teaming¶

OxideShield™ includes comprehensive red teaming capabilities for proactively testing your LLM application's security posture. Identify vulnerabilities before attackers do.

Why Red Team Your LLM?¶

Without Red Teaming	With Red Teaming
Discover vulnerabilities in production	Find issues before deployment
React to security incidents	Prevent security incidents
Unknown attack surface	Mapped and tested attack surface
Hope guards work	Verified guard effectiveness

Red Teaming Tools¶

1. Scanner CLI¶

Automated endpoint testing with 50+ attack probes:

oxideshield scan \
  --target https://api.example.com/v1/chat \
  --categories prompt_injection,jailbreak,system_leak

Scanner Documentation →

2. Attack Sample Library¶

Curated adversarial prompts from security research:

Category	Samples	Description
Prompt Injection	8	OWASP LLM01 attacks
Jailbreaks	6	DAN, Grandma, Sudo, etc.
System Prompt Leaks	8	OWASP LLM06 extraction
AutoDAN	3	Genetic algorithm attacks
GCG	3	Gradient-based suffixes
Encoding	5	Base64, Unicode, leetspeak
Roleplay	3	Persona-based bypasses

Attack Samples →

3. Benchmark Framework¶

Measure guard effectiveness against known attacks:

oxideshield benchmark \
  --guards PatternGuard,SemanticSimilarityGuard \
  --dataset jailbreakbench

Results include: - Precision, Recall, F1 Score - Latency percentiles (p50, p99) - False positive rate - Category-specific breakdown

Benchmarks →

4. Threat Intelligence¶

Integrated threat feeds from security research:

Source	Probes	Reference
JailbreakBench	100 behaviors	arxiv:2404.01318
HarmBench	11 categories	arxiv:2402.04249
Garak	600+ probes	NVIDIA/garak

Threat Intelligence →

Attack Categories¶

OWASP LLM Top 10¶

OxideShield™ covers the OWASP LLM Top 10 vulnerabilities:

ID	Vulnerability	Probes
LLM01	Prompt Injection	15+
LLM02	Insecure Output Handling	5+
LLM06	Sensitive Information Disclosure	10+
LLM07	Insecure Plugin Design	3+
LLM09	Overreliance	5+

Research-Based Attacks¶

Attack	Paper	Description
AutoDAN	arxiv:2310.04451	Genetic algorithm adversarial prompts
GCG	arxiv:2307.15043	Gradient-based universal attacks
PAIR	arxiv:2310.08419	Prompt Automatic Iterative Refinement
TAP	arxiv:2312.02119	Tree of Attacks with Pruning

Red Teaming Workflow¶

1. DISCOVER          2. TEST              3. FIX               4. VERIFY
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Run scanner │ ──▶ │  Analyze    │ ──▶ │  Configure  │ ──▶ │  Benchmark  │
│ on endpoint │     │  findings   │     │   guards    │     │   guards    │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Step 1: Discover¶

# Scan your endpoint
oxideshield scan \
  --target $YOUR_LLM_ENDPOINT \
  --api-key $API_KEY \
  --format json \
  --output findings.json

Step 2: Test¶

Review findings by severity:

# Filter critical/high only
jq '.findings | map(select(.severity == "critical" or .severity == "high"))' findings.json

Step 3: Fix¶

Configure guards to block detected attacks:

# guards.yaml
guards:
  - name: pattern
    type: pattern
    config:
      categories:
        - prompt_injection
        - jailbreak
        - system_prompt_leak
    action: block

  - name: perplexity
    type: perplexity
    config:
      threshold: 100  # Block adversarial suffixes
    action: block

Step 4: Verify¶

Benchmark guards against attack datasets:

oxideshield benchmark \
  --config guards.yaml \
  --dataset combined \
  --output benchmark-results.json

Continuous Red Teaming¶

Daily Automated Scans¶

# .github/workflows/security.yml
name: LLM Security
on:
  schedule:
    - cron: '0 2 * * *'

jobs:
  red-team:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Scan LLM Endpoint
        run: |
          oxideshield scan \
            --target ${{ secrets.LLM_ENDPOINT }} \
            --api-key ${{ secrets.LLM_API_KEY }} \
            --min-severity high \
            --format sarif \
            --output results.sarif

      - name: Upload Results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: results.sarif

Pre-Deployment Gate¶

#!/bin/bash
# pre-deploy.sh

# Run security scan
oxideshield scan --target $STAGING_URL --min-severity critical

# Exit code 1 = critical findings
if [ $? -eq 1 ]; then
    echo "BLOCKED: Critical vulnerabilities found"
    exit 1
fi

echo "PASSED: No critical vulnerabilities"

Next Steps¶

Scanner CLI - Detailed scanner documentation
Attack Samples - Browse attack library
Benchmarks - Measure guard effectiveness
Pattern Guard - Configure attack detection