Skip to content

Red Teaming

OxideShield™ includes comprehensive red teaming capabilities for proactively testing your LLM application's security posture. Identify vulnerabilities before attackers do.

Why Red Team Your LLM?

Without Red Teaming With Red Teaming
Discover vulnerabilities in production Find issues before deployment
React to security incidents Prevent security incidents
Unknown attack surface Mapped and tested attack surface
Hope guards work Verified guard effectiveness

Red Teaming Tools

1. Scanner CLI

Automated endpoint testing with 50+ attack probes:

oxideshield scan \
  --target https://api.example.com/v1/chat \
  --categories prompt_injection,jailbreak,system_leak

Scanner Documentation →

2. Attack Sample Library

Curated adversarial prompts from security research:

Category Samples Description
Prompt Injection 8 OWASP LLM01 attacks
Jailbreaks 6 DAN, Grandma, Sudo, etc.
System Prompt Leaks 8 OWASP LLM06 extraction
AutoDAN 3 Genetic algorithm attacks
GCG 3 Gradient-based suffixes
Encoding 5 Base64, Unicode, leetspeak
Roleplay 3 Persona-based bypasses

Attack Samples →

3. Benchmark Framework

Measure guard effectiveness against known attacks:

oxideshield benchmark \
  --guards PatternGuard,SemanticSimilarityGuard \
  --dataset jailbreakbench

Results include: - Precision, Recall, F1 Score - Latency percentiles (p50, p99) - False positive rate - Category-specific breakdown

Benchmarks →

4. Threat Intelligence

Integrated threat feeds from security research:

Source Probes Reference
JailbreakBench 100 behaviors arxiv:2404.01318
HarmBench 11 categories arxiv:2402.04249
Garak 600+ probes NVIDIA/garak

Threat Intelligence →

Attack Categories

OWASP LLM Top 10

OxideShield™ covers the OWASP LLM Top 10 vulnerabilities:

ID Vulnerability Probes
LLM01 Prompt Injection 15+
LLM02 Insecure Output Handling 5+
LLM06 Sensitive Information Disclosure 10+
LLM07 Insecure Plugin Design 3+
LLM09 Overreliance 5+

Research-Based Attacks

Attack Paper Description
AutoDAN arxiv:2310.04451 Genetic algorithm adversarial prompts
GCG arxiv:2307.15043 Gradient-based universal attacks
PAIR arxiv:2310.08419 Prompt Automatic Iterative Refinement
TAP arxiv:2312.02119 Tree of Attacks with Pruning

Red Teaming Workflow

1. DISCOVER          2. TEST              3. FIX               4. VERIFY
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Run scanner │ ──▶ │  Analyze    │ ──▶ │  Configure  │ ──▶ │  Benchmark  │
│ on endpoint │     │  findings   │     │   guards    │     │   guards    │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Step 1: Discover

# Scan your endpoint
oxideshield scan \
  --target $YOUR_LLM_ENDPOINT \
  --api-key $API_KEY \
  --format json \
  --output findings.json

Step 2: Test

Review findings by severity:

# Filter critical/high only
jq '.findings | map(select(.severity == "critical" or .severity == "high"))' findings.json

Step 3: Fix

Configure guards to block detected attacks:

# guards.yaml
guards:
  - name: pattern
    type: pattern
    config:
      categories:
        - prompt_injection
        - jailbreak
        - system_prompt_leak
    action: block

  - name: perplexity
    type: perplexity
    config:
      threshold: 100  # Block adversarial suffixes
    action: block

Step 4: Verify

Benchmark guards against attack datasets:

oxideshield benchmark \
  --config guards.yaml \
  --dataset combined \
  --output benchmark-results.json

Continuous Red Teaming

Daily Automated Scans

# .github/workflows/security.yml
name: LLM Security
on:
  schedule:
    - cron: '0 2 * * *'

jobs:
  red-team:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Scan LLM Endpoint
        run: |
          oxideshield scan \
            --target ${{ secrets.LLM_ENDPOINT }} \
            --api-key ${{ secrets.LLM_API_KEY }} \
            --min-severity high \
            --format sarif \
            --output results.sarif

      - name: Upload Results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: results.sarif

Pre-Deployment Gate

#!/bin/bash
# pre-deploy.sh

# Run security scan
oxideshield scan --target $STAGING_URL --min-severity critical

# Exit code 1 = critical findings
if [ $? -eq 1 ]; then
    echo "BLOCKED: Critical vulnerabilities found"
    exit 1
fi

echo "PASSED: No critical vulnerabilities"

Next Steps