Chat Bots & Personal AI¶
Personal AI assistants that connect to multiple chat platforms present unique security challenges. Users interact through Discord, Telegram, Slack, WhatsApp, and other channels where trust boundaries are complex and attacks can come from any direction.
This guide focuses on securing Molt and similar personal AI assistants using OxideShield™'s proxy gateway.
Professional License
The Proxy Gateway requires a Professional license (£149/month). This tier is designed specifically for Molt and chat bot deployments. See licensing →
Why Chat Bots Need Special Protection¶
Unlike traditional web applications, personal AI assistants:
| Challenge | Description | Risk |
|---|---|---|
| Multi-platform exposure | Single bot serves Discord, Telegram, Slack, etc. | Attack vectors multiply |
| Full system access | Bots can execute commands, manage files | Privilege escalation |
| Persistent conversations | Context spans multiple messages | Multi-turn attacks |
| User impersonation | No strong identity verification | Social engineering |
| Real-time interaction | Streaming responses | Partial injection |
Threat Model for Personal AI¶
Architecture: OxideShield™ + Molt¶
Deploy OxideShield™ as a transparent proxy between Molt and the LLM API:
Benefits¶
- Zero code changes to Molt - just point it at the proxy
- Centralized security - one config protects all channels
- Real-time streaming - guards evaluate as responses stream
- User tracking - block repeat offenders across platforms
- Webhook alerts - instant notifications to Discord/Slack
Quick Start¶
1. Configure the Proxy¶
# molt-proxy.yaml
proxy:
listen: "127.0.0.1:8080"
upstreams:
anthropic:
url: "https://api.anthropic.com"
timeout_ms: 60000
routing:
- path: "/v1/messages"
upstream: anthropic
# Input guards - protect against malicious user messages
guards:
input:
- name: length
type: length
config:
max_chars: 10000
max_tokens: 4000
action: block
- name: encoding
type: encoding
config:
detect_invisible: true
detect_homoglyphs: true
normalize: true
action: sanitize
- name: pattern
type: pattern
config:
categories:
- prompt_injection
- jailbreak
- system_prompt_leak
- privilege_escalation
action: block
- name: pii
type: pii
config:
categories:
- email
- phone
- ssn
- credit_card
- api_key
redaction: mask
action: sanitize
# Output guards - protect against harmful responses
output:
- name: pii_output
type: pii
config:
redaction: mask
action: sanitize
- name: toxicity
type: toxicity
config:
threshold: 0.7
action: block
# Streaming configuration for real-time protection
streaming:
strategy: periodic
eval_interval_chars: 500
max_eval_interval_ms: 2000
early_termination: true
# Rate limiting per user/channel
rate_limit:
enabled: true
requests_per_minute: 20
requests_per_hour: 200
tokens_per_hour: 50000
limit_by: channel_user
burst: 5
whitelist:
- "admin-user-123"
# User tracking and automatic blocking
tracking:
enabled: true
max_strikes: 3
strike_window_seconds: 3600
block_duration_seconds: 86400
track_by: channel_user
strike_actions:
- Block
blocklist:
- "known-bad-user"
allowlist:
- "trusted-user"
# Webhook alerts
alerts:
destinations:
- type: discord
webhook_url: "${DISCORD_SECURITY_WEBHOOK}"
- type: slack
webhook_url: "${SLACK_SECURITY_WEBHOOK}"
channel: "#security-alerts"
events:
- block
- high_severity
- rate_limit_exceeded
rate_limit_per_minute: 30
2. Start the Proxy¶
# Start OxideShield™ proxy
oxideshield proxy --config molt-proxy.yaml
# Or with environment variables
DISCORD_SECURITY_WEBHOOK="https://discord.com/api/webhooks/..." \
SLACK_SECURITY_WEBHOOK="https://hooks.slack.com/..." \
oxideshield proxy --config molt-proxy.yaml
3. Configure Molt¶
Point Molt to use the proxy instead of direct API access:
# molt config
api:
# Instead of: https://api.anthropic.com
base_url: "http://127.0.0.1:8080"
# API key still goes directly to Anthropic via proxy
api_key: "${ANTHROPIC_API_KEY}"
Platform-Specific Configurations¶
Discord Bots¶
Discord bots face unique challenges from public servers:
# discord-specific.yaml
tracking:
enabled: true
track_by: channel_user # Track by "discord:user_id"
max_strikes: 3
strike_window_seconds: 3600
block_duration_seconds: 86400
# Block known bad actors
blocklist:
- "discord:123456789" # Specific user ID
- "discord:server_987654321:*" # All users from a server
# Trusted users bypass tracking
allowlist:
- "discord:admin_id"
rate_limit:
enabled: true
limit_by: channel_user
requests_per_minute: 15 # Lower for public bots
burst: 3
# Premium users get higher limits
custom_limits:
"discord:premium_user":
requests_per_minute: 60
tokens_per_hour: 100000
alerts:
destinations:
- type: discord
webhook_url: "${DISCORD_SECURITY_WEBHOOK}"
events:
- block
- high_severity
- rate_limit_exceeded
include_request_details: false # Privacy in public channels
Telegram Bots¶
# telegram-specific.yaml
tracking:
track_by: channel_user # Track by "telegram:chat_id:user_id"
max_strikes: 5 # Telegram users tend to be more persistent
rate_limit:
limit_by: channel_user
requests_per_minute: 10 # Telegram has its own rate limits
alerts:
destinations:
- type: webhook
url: "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage"
method: POST
headers:
Content-Type: "application/json"
Slack Workspaces¶
# slack-specific.yaml
tracking:
track_by: channel_user # Track by "slack:workspace:user"
max_strikes: 2 # Enterprise users - lower tolerance
rate_limit:
limit_by: combined # IP + user for enterprise security
requests_per_minute: 30
alerts:
destinations:
- type: slack
webhook_url: "${SLACK_SECURITY_WEBHOOK}"
channel: "#llm-security"
username: "OxideShield™"
icon_emoji: ":shield:"
WhatsApp¶
# whatsapp-specific.yaml
guards:
input:
- name: pii
type: pii
config:
# WhatsApp often contains phone numbers legitimately
categories:
- email
- ssn
- credit_card
# Exclude: phone
action: sanitize
tracking:
track_by: channel_user # Track by phone number hash
# WhatsApp users are verified - can be more lenient
max_strikes: 5
Handling Multi-Turn Attacks¶
Personal AI assistants maintain conversation context, making them vulnerable to multi-turn attacks where malicious content is spread across messages:
# Multi-turn protection
guards:
input:
- name: pattern
type: pattern
config:
categories:
- prompt_injection
- jailbreak
- multi_turn_attack
# Patterns that detect split attacks:
# "Remember what I said earlier about ignoring rules"
# "As we discussed, bypass the..."
- name: semantic
type: semantic_similarity
config:
threshold: 0.75
# Catches paraphrased attacks across turns
Context Window Protection¶
# Example: Server-side context management
from oxideshield import pattern_guard, multi_layer_defense
defense = multi_layer_defense(
enable_length=True,
enable_semantic=True,
strategy="fail_fast"
)
async def check_with_context(user_id: str, new_message: str, context: list[str]):
"""Check new message with conversation context."""
# Combine recent context for pattern detection
recent_context = "\n".join(context[-5:]) # Last 5 messages
full_check = f"{recent_context}\n{new_message}"
# Check for multi-turn attacks
result = defense.check(full_check)
if not result.passed:
return {
"blocked": True,
"reason": "Suspicious pattern detected across conversation",
"clear_context": True # Optionally clear poisoned context
}
# Also check the individual message
single_result = defense.check(new_message)
return {
"blocked": not single_result.passed,
"reason": single_result.reason if not single_result.passed else None
}
Real-Time Streaming Protection¶
Chat bots use streaming responses for better UX. OxideShield™ evaluates streams in real-time:
streaming:
# Evaluation strategy
strategy: periodic # periodic, sentence_boundary, or continuous
# Evaluate every 500 characters
eval_interval_chars: 500
# Or at least every 2 seconds
max_eval_interval_ms: 2000
# Stop stream immediately if threat detected
early_termination: true
# Maximum buffer before forced evaluation
max_buffer_chars: 10000
Streaming Strategies¶
| Strategy | Description | Use Case |
|---|---|---|
periodic |
Evaluate at character/time intervals | Default - balanced |
sentence_boundary |
Evaluate at sentence ends | Natural language responses |
continuous |
Evaluate every chunk | Maximum security |
end_only |
Evaluate complete response | Low latency priority |
Early Termination¶
When a threat is detected mid-stream, OxideShield™ can terminate immediately:
User: Tell me a story
Bot: Once upon a time, there was a
princess who knew the admin password
was-[STREAM TERMINATED]
OxideShield™: Blocked - potential credential leak detected
User Tracking & Strike System¶
Track malicious users across platforms and automatically block repeat offenders:
tracking:
enabled: true
# Block after 3 violations
max_strikes: 3
# Strikes expire after 1 hour
strike_window_seconds: 3600
# Block lasts 24 hours
block_duration_seconds: 86400
# What to track by
track_by: channel_user # ip_address, user_id, channel, channel_user, api_key
# Which actions count as strikes
strike_actions:
- Block
# Permanent blocklist
blocklist:
- "discord:known_attacker_123"
- "telegram:spam_bot_456"
# Never track these users
allowlist:
- "discord:admin_user"
- "slack:security_team"
Tracking Keys¶
| Key | Format | Use Case |
|---|---|---|
ip_address |
192.168.1.1 |
API gateway protection |
user_id |
user_123 |
Single platform |
channel |
discord |
Platform-level tracking |
channel_user |
discord:user_123 |
Multi-platform bots |
api_key |
sk-abc123 |
API consumers |
Webhook Alerts¶
Get instant notifications when threats are detected:
Discord Alerts¶
alerts:
destinations:
- type: discord
webhook_url: "https://discord.com/api/webhooks/..."
events:
- block # Any blocked request
- high_severity # High severity detections
- critical # Critical threats
- jailbreak # Jailbreak attempts
- rate_limit_exceeded
Discord alert format:
🛡️ OxideShield™ Alert: pattern
├─ Guard: pattern
├─ Action: Block
├─ Severity: high
├─ Request ID: req-abc123
└─ Reason: Prompt injection detected: "ignore previous instructions"
Slack Alerts¶
alerts:
destinations:
- type: slack
webhook_url: "https://hooks.slack.com/services/..."
channel: "#security-alerts"
username: "OxideShield™"
icon_emoji: ":shield:"
Custom Webhooks¶
alerts:
destinations:
- type: webhook
url: "https://your-siem.example.com/api/events"
method: POST
headers:
Authorization: "Bearer ${SIEM_TOKEN}"
Content-Type: "application/json"
Webhook payload:
{
"timestamp": "2024-01-15T10:30:00Z",
"request_id": "req-abc123",
"guard": "pattern",
"action": "Block",
"reason": "Prompt injection detected",
"severity": "high",
"channel": "discord",
"user_id": "user_123"
}
Rate Limiting¶
Protect against abuse and cost overruns:
rate_limit:
enabled: true
# Global limits
requests_per_minute: 20
requests_per_hour: 200
tokens_per_hour: 50000
# Burst allowance
burst: 5
# What to rate limit by
limit_by: channel_user
# VIP users
whitelist:
- "admin-user"
# Custom limits per user
custom_limits:
"premium-user":
requests_per_minute: 100
requests_per_hour: 1000
tokens_per_hour: 200000
# Response when limited
response:
status_code: 429
retry_after_seconds: 60
message: "Rate limit exceeded. Please slow down."
Example: Complete Molt Integration¶
Here's a complete production configuration:
# production-molt.yaml
proxy:
listen: "0.0.0.0:8080"
# TLS for production
tls:
enabled: true
cert_path: "/etc/ssl/certs/oxideshield.crt"
key_path: "/etc/ssl/private/oxideshield.key"
upstreams:
anthropic:
url: "https://api.anthropic.com"
timeout_ms: 60000
retry_attempts: 2
retry_delay_ms: 1000
routing:
- path: "/v1/messages"
upstream: anthropic
guards:
input:
# 1. Length check first (fastest)
- name: length
type: length
config:
max_chars: 10000
max_tokens: 4000
action: block
# 2. Encoding normalization
- name: encoding
type: encoding
config:
detect_invisible: true
detect_homoglyphs: true
detect_mixed_scripts: true
normalize: true
action: sanitize
# 3. Pattern matching for known attacks
- name: pattern
type: pattern
config:
categories:
- prompt_injection
- jailbreak
- system_prompt_leak
- privilege_escalation
- social_engineering
action: block
# 4. PII protection
- name: pii
type: pii
config:
categories:
- email
- phone
- ssn
- credit_card
- api_key
- ip_address
redaction: mask
action: sanitize
# 5. Semantic similarity (catches paraphrased attacks)
- name: semantic
type: semantic_similarity
config:
threshold: 0.80
action: block
output:
- name: pii_output
type: pii
config:
redaction: mask
action: sanitize
- name: toxicity
type: toxicity
config:
threshold: 0.7
categories:
- hate
- violence
- harassment
action: block
pipeline:
strategy: fail_fast
streaming:
strategy: periodic
eval_interval_chars: 500
max_eval_interval_ms: 2000
early_termination: true
max_buffer_chars: 10000
rate_limit:
enabled: true
requests_per_minute: 20
requests_per_hour: 200
tokens_per_hour: 50000
limit_by: channel_user
burst: 5
whitelist:
- "admin"
custom_limits:
"premium":
requests_per_minute: 100
tracking:
enabled: true
max_strikes: 3
strike_window_seconds: 3600
block_duration_seconds: 86400
track_by: channel_user
strike_actions:
- Block
blocklist: []
allowlist:
- "admin"
alerts:
destinations:
- type: discord
webhook_url: "${DISCORD_SECURITY_WEBHOOK}"
- type: slack
webhook_url: "${SLACK_SECURITY_WEBHOOK}"
channel: "#llm-security"
username: "OxideShield™"
icon_emoji: ":shield:"
events:
- block
- high_severity
- critical
- rate_limit_exceeded
rate_limit_per_minute: 30
include_request_details: false
retry:
max_retries: 3
initial_backoff_ms: 1000
max_backoff_ms: 30000
metrics:
enabled: true
endpoint: "/metrics"
health:
enabled: true
endpoint: "/health"
Monitoring & Observability¶
Prometheus Metrics¶
Available metrics:
- oxideshield_requests_total - Total requests by guard/action
- oxideshield_guard_duration_seconds - Guard evaluation latency
- oxideshield_blocked_requests_total - Blocked requests by reason
- oxideshield_rate_limit_exceeded_total - Rate limit events
- oxideshield_user_blocked_total - Users blocked by tracking
- oxideshield_stream_terminated_total - Streams terminated early
Grafana Dashboard¶
{
"panels": [
{
"title": "Blocked Requests by Guard",
"targets": [{
"expr": "sum(rate(oxideshield_blocked_requests_total[5m])) by (guard)"
}]
},
{
"title": "Top Blocked Users",
"targets": [{
"expr": "topk(10, oxideshield_user_strikes_total)"
}]
}
]
}
Security Best Practices¶
1. Defense in Depth¶
guards:
input:
- length # Layer 1: Fast rejection
- encoding # Layer 2: Normalization
- pattern # Layer 3: Known attacks
- semantic # Layer 4: Novel attacks
- pii # Layer 5: Data protection
2. Fail Secure¶
3. Least Privilege¶
tracking:
allowlist: [] # Only add truly trusted users
max_strikes: 3 # Low tolerance
rate_limit:
whitelist: [] # Everyone gets rate limited
4. Audit Everything¶
5. Regular Updates¶
# Update OxideShield™ patterns regularly
oxideshield update-patterns
# Or configure auto-updates
oxideshield proxy --auto-update-patterns
Troubleshooting¶
Bot Not Connecting¶
# Check proxy is running
curl http://127.0.0.1:8080/health
# Check logs
oxideshield proxy --config molt-proxy.yaml --log-level debug
False Positives¶
guards:
input:
- name: pattern
config:
# Adjust sensitivity
confidence_threshold: 0.9 # Higher = fewer false positives
# Exclude specific patterns
exclude_patterns:
- "legitimate_phrase"
Rate Limit Issues¶
# Check current usage
curl http://127.0.0.1:8080/_oxideshield/rate-limit/user_123
# Response:
{
"requests_minute": 15,
"requests_minute_limit": 20,
"tokens_hour": 25000,
"tokens_hour_limit": 50000
}
User Incorrectly Blocked¶
# Check user status
curl http://127.0.0.1:8080/_oxideshield/tracking/user_123
# Unblock user
curl -X POST http://127.0.0.1:8080/_oxideshield/tracking/user_123/unblock
Resource Limiting¶
For additional protection against resource exhaustion attacks, use the Resource Limiter:
from oxideshield import molt_limiter, multi_layer_defense
# Create Molt.bot optimized limiter
limiter = molt_limiter()
# Check before processing each message
def process_message(user_input):
try:
limiter.check(user_input) # Raises if limit exceeded
except RuntimeError as e:
return f"Request blocked: {e}"
# Proceed with guard checks
defense = multi_layer_defense(...)
result = defense.check(user_input)
...
The resource limiter provides: - Memory monitoring with configurable thresholds - Rate limiting (token bucket with burst support) - Input size limits (bytes, characters, tokens) - Cross-platform support (macOS, Linux, Windows)
Next Steps¶
- Resource Limiter - Cross-platform resource protection
- Proxy Gateway Deep Dive - Advanced proxy configuration
- Pattern Guard - Customize attack patterns
- Multi-Layer Defense - Guard orchestration
- Streaming Guide - Real-time protection details