Skip to content

Red Team Scanner

OxideShield includes a powerful red team scanner for automated security testing of LLM endpoints. The scanner probes your LLM APIs with known attack patterns to identify vulnerabilities before attackers do.

Quick Start

# Basic scan with built-in probes
oxideshield scan --target https://api.example.com/v1/chat

# With API authentication
oxideshield scan \
  --target https://api.anthropic.com/v1/messages \
  --api-key $ANTHROPIC_API_KEY \
  --model claude-3-sonnet

# Full scan with all probe categories
oxideshield scan \
  --target https://api.example.com/v1/chat \
  --categories prompt_injection,jailbreak,system_leak \
  --concurrency 10 \
  --output report.json \
  --format json

Command Reference

oxideshield scan [OPTIONS]

Required Arguments

Argument Description
--target, -t Target URL of the LLM endpoint to scan

Optional Arguments

Argument Default Description
--api-key $OXIDESHIELD_API_KEY API key for endpoint authentication
--model, -m - Model to use for requests (e.g., claude-3-sonnet)
--probes, -p Built-in Path to custom probe definition file (YAML)
--categories, -c prompt_injection Comma-separated probe categories to run
--concurrency 10 Maximum concurrent requests
--timeout 30 Request timeout in seconds
--min-severity low Minimum severity to report (info, low, medium, high, critical)
--format, -f text Output format (text, json, markdown, sarif)
--output, -o stdout Output file path

Probe Categories

Prompt Injection (OWASP LLM01)

Tests for instruction override vulnerabilities. Probes in this category attempt to override system instructions, establish new contexts, activate fictional bypass modes, escape assigned roles, inject instruction delimiters, use encoded instructions, continue fictional conversations, and exploit hypothetical framing.

Includes probes at High to Critical severity covering 8 distinct attack techniques.

Jailbreak Attacks

Tests for persona and restriction bypass. Probes in this category cover unrestricted persona jailbreaks, emotional manipulation bypasses, elevated privilege exploitation, conflicting persona attacks, language-based filter evasion, and creative context exploitation.

Includes probes at Medium to Critical severity covering 6 distinct attack techniques.

System Prompt Leak (OWASP LLM06)

Tests for sensitive information disclosure. Probes in this category test direct system prompt requests, verbatim extraction attempts, data exfiltration via markup, configuration summary extraction, error-triggered information disclosure, context/history extraction, training data extraction, and available tools discovery.

Includes probes at Medium to Critical severity covering 8 distinct attack techniques.

Output Formats

Text (Default)

Human-readable output with colored severity indicators:

Target: https://api.example.com/v1/chat
Probes: 22

Found 3 vulnerabilities

┌────┬──────────────────────┬──────────┬────────────────────────┐
│ #  │ Title                │ Severity │ Category               │
├────┼──────────────────────┼──────────┼────────────────────────┤
│ 1  │ DAN Jailbreak        │ CRITICAL │ prompt_injection       │
│ 2  │ System Prompt Leak   │ HIGH     │ sensitive_info         │
│ 3  │ Developer Mode       │ HIGH     │ prompt_injection       │
└────┴──────────────────────┴──────────┴────────────────────────┘

Summary:
  22 probes executed
  3 vulnerabilities found

JSON

Machine-readable format for CI/CD integration:

{
  "target": "https://api.example.com/v1/chat",
  "scan_time": "2024-01-15T10:30:00Z",
  "probes_executed": 22,
  "findings": [
    {
      "title": "Example Finding",
      "severity": "critical",
      "category": "prompt_injection",
      "description": "Description of the vulnerability found",
      "remediation": "Recommended guard configuration to address the finding"
    }
  ]
}

SARIF

Static Analysis Results Interchange Format for GitHub Security:

oxideshield scan --target $URL --format sarif --output results.sarif

Upload to GitHub Code Scanning:

gh api repos/{owner}/{repo}/code-scanning/sarifs \
  --method POST \
  --field sarif=@results.sarif

Markdown

Documentation-ready format:

oxideshield scan --target $URL --format markdown --output report.md

Custom Probes

You can create custom probe definitions in YAML to test for organization-specific attack patterns. Custom probe files support the following fields:

  • id - Unique identifier for the probe
  • name - Human-readable probe name
  • description - What the probe tests
  • category - Attack category (e.g., prompt_injection, jailbreak, system_leak)
  • severity - Severity level (info, low, medium, high, critical)
  • payload - The attack prompt to send (define your own payloads)
  • tags - Labels for filtering and organization

See the OxideShield Professional documentation for full custom probe authoring guidance and examples.

Run with custom probes:

oxideshield scan --target $URL --probes custom-probes.yaml

CI/CD Integration

GitHub Actions

name: LLM Security Scan

on:
  schedule:
    - cron: '0 0 * * *'  # Daily
  workflow_dispatch:

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - name: Install OxideShield
        run: cargo install oxideshield

      - name: Run Security Scan
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          oxideshield scan \
            --target https://api.anthropic.com/v1/messages \
            --api-key $ANTHROPIC_API_KEY \
            --format sarif \
            --output results.sarif

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: results.sarif

      - name: Fail on Critical
        run: |
          if grep -q '"level": "error"' results.sarif; then
            echo "Critical vulnerabilities found!"
            exit 1
          fi

GitLab CI

llm-security-scan:
  stage: security
  script:
    - oxideshield scan
        --target $LLM_ENDPOINT
        --api-key $LLM_API_KEY
        --format json
        --output gl-sast-report.json
  artifacts:
    reports:
      sast: gl-sast-report.json

Exit Codes

Code Meaning
0 No critical vulnerabilities found
1 Critical vulnerabilities detected
2 Scan failed (connection error, invalid config)

Best Practices

1. Regular Scanning

Schedule daily scans to catch regressions:

# Cron job for daily scans
0 2 * * * oxideshield scan --target $URL --output /var/log/llm-scan-$(date +%Y%m%d).json

2. Progressive Testing

Start with low severity, increase over time:

# Week 1: Critical only
oxideshield scan --target $URL --min-severity critical

# Week 2: High and above
oxideshield scan --target $URL --min-severity high

# Week 3+: Full scan
oxideshield scan --target $URL --min-severity low

3. Combine with Guards

Use scan results to configure guards:

# Scan to identify vulnerabilities
oxideshield scan --target $URL --format json --output findings.json

# Configure guards based on findings
oxideshield init --from-scan findings.json

Troubleshooting

Target Not Reachable

# Check connectivity
curl -I https://api.example.com/v1/chat

# Check with verbose output
oxideshield scan --target $URL --verbose

Authentication Failures

# Verify API key
oxideshield scan --target $URL --api-key $KEY --verbose

# Check Authorization header format
# Default: "Authorization: Bearer {key}"

Timeout Issues

# Increase timeout for slow endpoints
oxideshield scan --target $URL --timeout 60

# Reduce concurrency if rate limited
oxideshield scan --target $URL --concurrency 2

Next Steps