Red Team Scanner¶
OxideShield includes a powerful red team scanner for automated security testing of LLM endpoints. The scanner probes your LLM APIs with known attack patterns to identify vulnerabilities before attackers do.
Quick Start¶
# Basic scan with built-in probes
oxideshield scan --target https://api.example.com/v1/chat
# With API authentication
oxideshield scan \
--target https://api.anthropic.com/v1/messages \
--api-key $ANTHROPIC_API_KEY \
--model claude-3-sonnet
# Full scan with all probe categories
oxideshield scan \
--target https://api.example.com/v1/chat \
--categories prompt_injection,jailbreak,system_leak \
--concurrency 10 \
--output report.json \
--format json
Command Reference¶
Required Arguments¶
| Argument | Description |
|---|---|
--target, -t |
Target URL of the LLM endpoint to scan |
Optional Arguments¶
| Argument | Default | Description |
|---|---|---|
--api-key |
$OXIDESHIELD_API_KEY |
API key for endpoint authentication |
--model, -m |
- | Model to use for requests (e.g., claude-3-sonnet) |
--probes, -p |
Built-in | Path to custom probe definition file (YAML) |
--categories, -c |
prompt_injection |
Comma-separated probe categories to run |
--concurrency |
10 |
Maximum concurrent requests |
--timeout |
30 |
Request timeout in seconds |
--min-severity |
low |
Minimum severity to report (info, low, medium, high, critical) |
--format, -f |
text |
Output format (text, json, markdown, sarif) |
--output, -o |
stdout | Output file path |
Probe Categories¶
Prompt Injection (OWASP LLM01)¶
Tests for instruction override vulnerabilities. Probes in this category attempt to override system instructions, establish new contexts, activate fictional bypass modes, escape assigned roles, inject instruction delimiters, use encoded instructions, continue fictional conversations, and exploit hypothetical framing.
Includes probes at High to Critical severity covering 8 distinct attack techniques.
Jailbreak Attacks¶
Tests for persona and restriction bypass. Probes in this category cover unrestricted persona jailbreaks, emotional manipulation bypasses, elevated privilege exploitation, conflicting persona attacks, language-based filter evasion, and creative context exploitation.
Includes probes at Medium to Critical severity covering 6 distinct attack techniques.
System Prompt Leak (OWASP LLM06)¶
Tests for sensitive information disclosure. Probes in this category test direct system prompt requests, verbatim extraction attempts, data exfiltration via markup, configuration summary extraction, error-triggered information disclosure, context/history extraction, training data extraction, and available tools discovery.
Includes probes at Medium to Critical severity covering 8 distinct attack techniques.
Output Formats¶
Text (Default)¶
Human-readable output with colored severity indicators:
Target: https://api.example.com/v1/chat
Probes: 22
Found 3 vulnerabilities
┌────┬──────────────────────┬──────────┬────────────────────────┐
│ # │ Title │ Severity │ Category │
├────┼──────────────────────┼──────────┼────────────────────────┤
│ 1 │ DAN Jailbreak │ CRITICAL │ prompt_injection │
│ 2 │ System Prompt Leak │ HIGH │ sensitive_info │
│ 3 │ Developer Mode │ HIGH │ prompt_injection │
└────┴──────────────────────┴──────────┴────────────────────────┘
Summary:
22 probes executed
3 vulnerabilities found
JSON¶
Machine-readable format for CI/CD integration:
{
"target": "https://api.example.com/v1/chat",
"scan_time": "2024-01-15T10:30:00Z",
"probes_executed": 22,
"findings": [
{
"title": "Example Finding",
"severity": "critical",
"category": "prompt_injection",
"description": "Description of the vulnerability found",
"remediation": "Recommended guard configuration to address the finding"
}
]
}
SARIF¶
Static Analysis Results Interchange Format for GitHub Security:
Upload to GitHub Code Scanning:
Markdown¶
Documentation-ready format:
Custom Probes¶
You can create custom probe definitions in YAML to test for organization-specific attack patterns. Custom probe files support the following fields:
- id - Unique identifier for the probe
- name - Human-readable probe name
- description - What the probe tests
- category - Attack category (e.g.,
prompt_injection,jailbreak,system_leak) - severity - Severity level (
info,low,medium,high,critical) - payload - The attack prompt to send (define your own payloads)
- tags - Labels for filtering and organization
See the OxideShield Professional documentation for full custom probe authoring guidance and examples.
Run with custom probes:
CI/CD Integration¶
GitHub Actions¶
name: LLM Security Scan
on:
schedule:
- cron: '0 0 * * *' # Daily
workflow_dispatch:
jobs:
scan:
runs-on: ubuntu-latest
steps:
- name: Install OxideShield
run: cargo install oxideshield
- name: Run Security Scan
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
oxideshield scan \
--target https://api.anthropic.com/v1/messages \
--api-key $ANTHROPIC_API_KEY \
--format sarif \
--output results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: results.sarif
- name: Fail on Critical
run: |
if grep -q '"level": "error"' results.sarif; then
echo "Critical vulnerabilities found!"
exit 1
fi
GitLab CI¶
llm-security-scan:
stage: security
script:
- oxideshield scan
--target $LLM_ENDPOINT
--api-key $LLM_API_KEY
--format json
--output gl-sast-report.json
artifacts:
reports:
sast: gl-sast-report.json
Exit Codes¶
| Code | Meaning |
|---|---|
0 |
No critical vulnerabilities found |
1 |
Critical vulnerabilities detected |
2 |
Scan failed (connection error, invalid config) |
Best Practices¶
1. Regular Scanning¶
Schedule daily scans to catch regressions:
# Cron job for daily scans
0 2 * * * oxideshield scan --target $URL --output /var/log/llm-scan-$(date +%Y%m%d).json
2. Progressive Testing¶
Start with low severity, increase over time:
# Week 1: Critical only
oxideshield scan --target $URL --min-severity critical
# Week 2: High and above
oxideshield scan --target $URL --min-severity high
# Week 3+: Full scan
oxideshield scan --target $URL --min-severity low
3. Combine with Guards¶
Use scan results to configure guards:
# Scan to identify vulnerabilities
oxideshield scan --target $URL --format json --output findings.json
# Configure guards based on findings
oxideshield init --from-scan findings.json
Troubleshooting¶
Target Not Reachable¶
# Check connectivity
curl -I https://api.example.com/v1/chat
# Check with verbose output
oxideshield scan --target $URL --verbose
Authentication Failures¶
# Verify API key
oxideshield scan --target $URL --api-key $KEY --verbose
# Check Authorization header format
# Default: "Authorization: Bearer {key}"
Timeout Issues¶
# Increase timeout for slow endpoints
oxideshield scan --target $URL --timeout 60
# Reduce concurrency if rate limited
oxideshield scan --target $URL --concurrency 2
Next Steps¶
- Attack Samples Library - Built-in adversarial prompts
- Benchmark Framework - Measure guard effectiveness
- Pattern Guard - Block detected attack patterns