Proxy Advanced Features¶
The OxideShield™ proxy gateway includes advanced features for enterprise deployments: rate limiting, webhook alerts, user tracking, and streaming response protection.
Rate Limiting¶
Protect against abuse and control costs with per-client rate limiting.
Configuration¶
rate_limit:
enabled: true
# Request limits
requests_per_minute: 60
requests_per_hour: 1000
# Token limits (estimated from request size)
tokens_per_hour: 100000
# Burst allowance for short spikes
burst: 10
# What to rate limit by
limit_by: ip_address # ip_address, api_key, header, combined
# Clients that bypass rate limiting
whitelist:
- "admin-api-key"
- "192.168.1.100"
# Custom limits for specific clients
custom_limits:
"premium-api-key":
requests_per_minute: 200
requests_per_hour: 5000
tokens_per_hour: 500000
# Response when rate limited
response:
status_code: 429
retry_after_seconds: 60
message: "Rate limit exceeded. Please slow down."
Rate Limit Keys¶
| Key | Description | Example |
|---|---|---|
ip_address |
Client IP address | 192.168.1.1 |
api_key |
API key from Authorization header | sk-abc123 |
header |
Custom header value (e.g., X-Client-ID) |
client-123 |
combined |
IP + API key combination | 192.168.1.1:sk-abc123 |
API Endpoints¶
Check Rate Limit Status¶
Response:
{
"key": "192.168.1.1",
"requests_minute": 45,
"requests_minute_limit": 60,
"requests_hour": 500,
"requests_hour_limit": 1000,
"tokens_hour": 75000,
"tokens_hour_limit": 100000,
"burst_available": 8,
"limited": false
}
Rate Limit Response¶
When rate limited, clients receive:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json
{
"error": {
"type": "rate_limit_exceeded",
"message": "Rate limit exceeded: 60 requests/minute",
"retry_after": 60
}
}
Webhook Alerts¶
Receive real-time notifications when security events occur.
Configuration¶
alerts:
# Alert destinations
destinations:
- type: slack
webhook_url: "https://hooks.slack.com/services/T00/B00/XXX"
channel: "#security-alerts"
username: "OxideShield™"
icon_emoji: ":shield:"
- type: discord
webhook_url: "https://discord.com/api/webhooks/123/abc"
- type: webhook
url: "https://your-siem.example.com/api/events"
method: POST
headers:
Authorization: "Bearer ${SIEM_TOKEN}"
Content-Type: "application/json"
# Events to alert on
events:
- block # Any blocked request
- high_severity # High severity detections
- critical # Critical threats
- pii_detected # PII found
- jailbreak # Jailbreak attempts
- rate_limit_exceeded
# Alert rate limiting
rate_limit_per_minute: 60
# Include request details (may contain sensitive data)
include_request_details: false
# Retry failed alerts
retry:
max_retries: 3
initial_backoff_ms: 1000
max_backoff_ms: 30000
Alert Destinations¶
Slack¶
- type: slack
webhook_url: "https://hooks.slack.com/services/..."
channel: "#security-alerts" # Optional override
username: "OxideShield™" # Optional
icon_emoji: ":shield:" # Optional
Alert format:
🚨 Security Alert
━━━━━━━━━━━━━━━━━
Guard: pattern
Action: Block
Severity: high
Request ID: req-abc123
━━━━━━━━━━━━━━━━━
Reason: Prompt injection detected
Discord¶
Uses Discord embeds with color-coded severity: - Critical: Red (#FF0000) - High: Orange (#FF8C00) - Medium: Gold (#FFD700) - Low: Turquoise (#00CED1)
Generic Webhook¶
- type: webhook
url: "https://your-endpoint.com/alerts"
method: POST
headers:
Authorization: "Bearer ${TOKEN}"
Content-Type: "application/json"
Payload:
{
"timestamp": "2024-01-15T10:30:00Z",
"request_id": "req-abc123",
"guard": "pattern",
"action": "Block",
"reason": "Prompt injection detected",
"severity": "high",
"channel": "discord",
"user_id": "user_123"
}
Alert Events¶
| Event | Triggered When |
|---|---|
block |
Any request blocked |
high_severity |
High severity detection |
critical |
Critical severity detection |
pii_detected |
PII found in request/response |
jailbreak |
Jailbreak attempt detected |
rate_limit_exceeded |
Client hit rate limit |
user_blocked |
User automatically blocked |
all |
All events (for debugging) |
User Tracking¶
Track suspicious users and automatically block repeat offenders.
Configuration¶
tracking:
enabled: true
# Strikes before automatic block
max_strikes: 3
# Time window for counting strikes (seconds)
strike_window_seconds: 3600 # 1 hour
# Block duration after max strikes (seconds)
block_duration_seconds: 86400 # 24 hours
# What to track by
track_by: channel_user # ip_address, user_id, channel, channel_user, api_key
# Actions that count as strikes
strike_actions:
- Block
# Permanent blocklist
blocklist:
- "192.168.1.50"
- "bad-api-key"
- "discord:spam_user_123"
# Never track these (trusted users)
allowlist:
- "admin-key"
- "discord:admin_user"
Tracking Keys¶
| Key | Format | Use Case |
|---|---|---|
ip_address |
192.168.1.1 |
Web applications |
user_id |
user_123 |
Authenticated apps |
channel |
discord |
Platform-level tracking |
channel_user |
discord:user_123 |
Multi-platform bots |
api_key |
sk-abc123 |
API consumers |
Strike System¶
- User triggers a guard action that counts as a strike
- Strike is recorded with timestamp
- After
max_strikeswithinstrike_window_seconds, user is blocked - Block lasts for
block_duration_seconds - After block expires, strike count resets
API Endpoints¶
Get User Stats¶
Response:
{
"key": "discord:user_123",
"strikes": 2,
"max_strikes": 3,
"blocked": false,
"block_expires": null,
"first_seen": "2024-01-15T08:00:00Z",
"last_activity": "2024-01-15T10:30:00Z",
"total_requests": 150,
"blocked_requests": 5,
"triggered_guards": {
"pattern": 3,
"pii": 2
}
}
List Blocked Users¶
Response:
{
"blocked_users": [
{
"key": "discord:spam_user",
"blocked_at": "2024-01-15T09:00:00Z",
"expires_at": "2024-01-16T09:00:00Z",
"reason": "Exceeded 3 strikes",
"permanent": false
}
],
"total": 1
}
Block User Manually¶
POST /_oxideshield/tracking/{key}/block
Content-Type: application/json
{
"duration_seconds": 86400,
"reason": "Manual block by admin"
}
Unblock User¶
Get Top Offenders¶
Response:
{
"offenders": [
{
"key": "discord:user_456",
"strikes": 2,
"blocked_requests": 15,
"last_activity": "2024-01-15T10:25:00Z"
}
]
}
Streaming Protection¶
Real-time guard evaluation for streaming LLM responses.
Configuration¶
streaming:
# Evaluation strategy
strategy: periodic # periodic, sentence_boundary, continuous, end_only
# Evaluate every N characters
eval_interval_chars: 500
# Evaluate every N estimated tokens
eval_interval_tokens: 100
# Maximum time between evaluations (ms)
max_eval_interval_ms: 2000
# Terminate stream immediately on threat
early_termination: true
# Force evaluation if buffer exceeds this size
max_buffer_chars: 10000
Strategies¶
| Strategy | Description | Latency | Security |
|---|---|---|---|
end_only |
Evaluate only after stream completes | Lowest | Basic |
periodic |
Evaluate at character/time intervals | Low | Good |
sentence_boundary |
Evaluate at sentence ends | Medium | Better |
continuous |
Evaluate every chunk | Highest | Maximum |
Early Termination¶
When a threat is detected mid-stream:
- Stream is immediately terminated
- SSE error event is sent to client
- Alert is triggered (if configured)
- Strike is recorded (if tracking enabled)
Error event format:
event: error
data: {"type":"security_violation","guard":"pii","message":"Potential credential leak detected","request_id":"req-123"}
Flow Example¶
LLM → Proxy: "The capital of France"
Proxy → Client: ✓ Forward chunk
LLM → Proxy: "is Paris. The admin password"
Proxy → Client: ✓ Forward chunk (no threat yet)
LLM → Proxy: "is hunter2..."
Proxy: 🛑 PII detected!
Proxy → Client: ✗ Stream terminated
Proxy → Client: SSE error event
Health Checks¶
Monitor proxy health and readiness.
Endpoints¶
Health Check¶
Response:
{
"status": "healthy",
"uptime_seconds": 3600,
"version": "0.1.0",
"guards": {
"loaded": 5,
"active": 5
},
"upstreams": {
"anthropic": "healthy",
"openai": "healthy"
}
}
Readiness Check¶
Returns 200 OK when ready to accept traffic.
Liveness Check¶
Returns 200 OK if process is alive.
Prometheus Metrics¶
Available Metrics¶
# Request metrics
oxideshield_requests_total{guard="pattern",action="block"} 150
oxideshield_requests_total{guard="pii",action="sanitize"} 2341
# Latency metrics
oxideshield_guard_duration_seconds_bucket{guard="pattern",le="0.001"} 9500
oxideshield_guard_duration_seconds_bucket{guard="pattern",le="0.01"} 9800
# Rate limiting
oxideshield_rate_limit_exceeded_total{key_type="ip_address"} 45
# User tracking
oxideshield_users_blocked_total 12
oxideshield_strikes_recorded_total{guard="pattern"} 89
# Streaming
oxideshield_streams_terminated_total{reason="pii_detected"} 5
oxideshield_stream_chunks_processed_total 125000
# Alerts
oxideshield_alerts_sent_total{destination="slack",status="success"} 150
oxideshield_alerts_sent_total{destination="slack",status="failure"} 2
# Upstreams
oxideshield_upstream_requests_total{upstream="anthropic"} 50000
oxideshield_upstream_latency_seconds{upstream="anthropic",quantile="0.99"} 0.5
Complete Example¶
# production-proxy.yaml
proxy:
listen: "0.0.0.0:8080"
tls:
enabled: true
cert_path: "/etc/ssl/certs/proxy.crt"
key_path: "/etc/ssl/private/proxy.key"
upstreams:
anthropic:
url: "https://api.anthropic.com"
timeout_ms: 60000
guards:
input:
- name: pattern
type: pattern
action: block
- name: pii
type: pii
action: sanitize
output:
- name: toxicity
type: toxicity
action: block
rate_limit:
enabled: true
requests_per_minute: 60
tokens_per_hour: 100000
limit_by: api_key
tracking:
enabled: true
max_strikes: 3
track_by: api_key
alerts:
destinations:
- type: slack
webhook_url: "${SLACK_WEBHOOK}"
events:
- block
- high_severity
streaming:
strategy: periodic
eval_interval_chars: 500
early_termination: true
metrics:
enabled: true
Next Steps¶
- Proxy Gateway Basics - Getting started
- Chat Bots Use Case - Securing personal AI assistants
- Pattern Guard - Attack pattern detection