Skip to content

Chat Bots & Personal AI

Personal AI assistants that connect to multiple chat platforms present unique security challenges. Users interact through Discord, Telegram, Slack, WhatsApp, and other channels where trust boundaries are complex and attacks can come from any direction.

This guide focuses on securing Molt and similar personal AI assistants using OxideShield™'s proxy gateway.

Professional License

The Proxy Gateway requires a Professional license (£149/month). This tier is designed specifically for Molt and chat bot deployments. See licensing →

Why Chat Bots Need Special Protection

Unlike traditional web applications, personal AI assistants:

Challenge Description Risk
Multi-platform exposure Single bot serves Discord, Telegram, Slack, etc. Attack vectors multiply
Full system access Bots can execute commands, manage files Privilege escalation
Persistent conversations Context spans multiple messages Multi-turn attacks
User impersonation No strong identity verification Social engineering
Real-time interaction Streaming responses Partial injection

Threat Model for Personal AI

Chat Bot Threat Model

Architecture: OxideShield™ + Molt

Deploy OxideShield™ as a transparent proxy between Molt and the LLM API:

Chat Bot Architecture

Benefits

  • Zero code changes to Molt - just point it at the proxy
  • Centralized security - one config protects all channels
  • Real-time streaming - guards evaluate as responses stream
  • User tracking - block repeat offenders across platforms
  • Webhook alerts - instant notifications to Discord/Slack

Quick Start

1. Configure the Proxy

# molt-proxy.yaml
proxy:
  listen: "127.0.0.1:8080"

  upstreams:
    anthropic:
      url: "https://api.anthropic.com"
      timeout_ms: 60000

  routing:
    - path: "/v1/messages"
      upstream: anthropic

  # Input guards - protect against malicious user messages
  guards:
    input:
      - name: length
        type: length
        config:
          max_chars: 10000
          max_tokens: 4000
        action: block

      - name: encoding
        type: encoding
        config:
          detect_invisible: true
          detect_homoglyphs: true
          normalize: true
        action: sanitize

      - name: pattern
        type: pattern
        config:
          categories:
            - prompt_injection
            - jailbreak
            - system_prompt_leak
            - privilege_escalation
        action: block

      - name: pii
        type: pii
        config:
          categories:
            - email
            - phone
            - ssn
            - credit_card
            - api_key
          redaction: mask
        action: sanitize

    # Output guards - protect against harmful responses
    output:
      - name: pii_output
        type: pii
        config:
          redaction: mask
        action: sanitize

      - name: toxicity
        type: toxicity
        config:
          threshold: 0.7
        action: block

  # Streaming configuration for real-time protection
  streaming:
    strategy: periodic
    eval_interval_chars: 500
    max_eval_interval_ms: 2000
    early_termination: true

  # Rate limiting per user/channel
  rate_limit:
    enabled: true
    requests_per_minute: 20
    requests_per_hour: 200
    tokens_per_hour: 50000
    limit_by: channel_user
    burst: 5
    whitelist:
      - "admin-user-123"

  # User tracking and automatic blocking
  tracking:
    enabled: true
    max_strikes: 3
    strike_window_seconds: 3600
    block_duration_seconds: 86400
    track_by: channel_user
    strike_actions:
      - Block
    blocklist:
      - "known-bad-user"
    allowlist:
      - "trusted-user"

  # Webhook alerts
  alerts:
    destinations:
      - type: discord
        webhook_url: "${DISCORD_SECURITY_WEBHOOK}"

      - type: slack
        webhook_url: "${SLACK_SECURITY_WEBHOOK}"
        channel: "#security-alerts"

    events:
      - block
      - high_severity
      - rate_limit_exceeded

    rate_limit_per_minute: 30

2. Start the Proxy

# Start OxideShield™ proxy
oxideshield proxy --config molt-proxy.yaml

# Or with environment variables
DISCORD_SECURITY_WEBHOOK="https://discord.com/api/webhooks/..." \
SLACK_SECURITY_WEBHOOK="https://hooks.slack.com/..." \
oxideshield proxy --config molt-proxy.yaml

3. Configure Molt

Point Molt to use the proxy instead of direct API access:

# molt config
api:
  # Instead of: https://api.anthropic.com
  base_url: "http://127.0.0.1:8080"
  # API key still goes directly to Anthropic via proxy
  api_key: "${ANTHROPIC_API_KEY}"

Platform-Specific Configurations

Discord Bots

Discord bots face unique challenges from public servers:

# discord-specific.yaml
tracking:
  enabled: true
  track_by: channel_user  # Track by "discord:user_id"
  max_strikes: 3
  strike_window_seconds: 3600
  block_duration_seconds: 86400

  # Block known bad actors
  blocklist:
    - "discord:123456789"  # Specific user ID
    - "discord:server_987654321:*"  # All users from a server

  # Trusted users bypass tracking
  allowlist:
    - "discord:admin_id"

rate_limit:
  enabled: true
  limit_by: channel_user
  requests_per_minute: 15  # Lower for public bots
  burst: 3

  # Premium users get higher limits
  custom_limits:
    "discord:premium_user":
      requests_per_minute: 60
      tokens_per_hour: 100000

alerts:
  destinations:
    - type: discord
      webhook_url: "${DISCORD_SECURITY_WEBHOOK}"

  events:
    - block
    - high_severity
    - rate_limit_exceeded

  include_request_details: false  # Privacy in public channels

Telegram Bots

# telegram-specific.yaml
tracking:
  track_by: channel_user  # Track by "telegram:chat_id:user_id"
  max_strikes: 5  # Telegram users tend to be more persistent

rate_limit:
  limit_by: channel_user
  requests_per_minute: 10  # Telegram has its own rate limits

alerts:
  destinations:
    - type: webhook
      url: "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage"
      method: POST
      headers:
        Content-Type: "application/json"

Slack Workspaces

# slack-specific.yaml
tracking:
  track_by: channel_user  # Track by "slack:workspace:user"
  max_strikes: 2  # Enterprise users - lower tolerance

rate_limit:
  limit_by: combined  # IP + user for enterprise security
  requests_per_minute: 30

alerts:
  destinations:
    - type: slack
      webhook_url: "${SLACK_SECURITY_WEBHOOK}"
      channel: "#llm-security"
      username: "OxideShield™"
      icon_emoji: ":shield:"

WhatsApp

# whatsapp-specific.yaml
guards:
  input:
    - name: pii
      type: pii
      config:
        # WhatsApp often contains phone numbers legitimately
        categories:
          - email
          - ssn
          - credit_card
          # Exclude: phone
      action: sanitize

tracking:
  track_by: channel_user  # Track by phone number hash
  # WhatsApp users are verified - can be more lenient
  max_strikes: 5

Handling Multi-Turn Attacks

Personal AI assistants maintain conversation context, making them vulnerable to multi-turn attacks where malicious content is spread across messages:

# Multi-turn protection
guards:
  input:
    - name: pattern
      type: pattern
      config:
        categories:
          - prompt_injection
          - jailbreak
          - multi_turn_attack
        # Patterns that detect split attacks:
        # "Remember what I said earlier about ignoring rules"
        # "As we discussed, bypass the..."

    - name: semantic
      type: semantic_similarity
      config:
        threshold: 0.75
        # Catches paraphrased attacks across turns

Context Window Protection

# Example: Server-side context management
from oxideshield import pattern_guard, multi_layer_defense

defense = multi_layer_defense(
    enable_length=True,
    enable_semantic=True,
    strategy="fail_fast"
)

async def check_with_context(user_id: str, new_message: str, context: list[str]):
    """Check new message with conversation context."""

    # Combine recent context for pattern detection
    recent_context = "\n".join(context[-5:])  # Last 5 messages
    full_check = f"{recent_context}\n{new_message}"

    # Check for multi-turn attacks
    result = defense.check(full_check)

    if not result.passed:
        return {
            "blocked": True,
            "reason": "Suspicious pattern detected across conversation",
            "clear_context": True  # Optionally clear poisoned context
        }

    # Also check the individual message
    single_result = defense.check(new_message)
    return {
        "blocked": not single_result.passed,
        "reason": single_result.reason if not single_result.passed else None
    }

Real-Time Streaming Protection

Chat bots use streaming responses for better UX. OxideShield™ evaluates streams in real-time:

streaming:
  # Evaluation strategy
  strategy: periodic  # periodic, sentence_boundary, or continuous

  # Evaluate every 500 characters
  eval_interval_chars: 500

  # Or at least every 2 seconds
  max_eval_interval_ms: 2000

  # Stop stream immediately if threat detected
  early_termination: true

  # Maximum buffer before forced evaluation
  max_buffer_chars: 10000

Streaming Strategies

Strategy Description Use Case
periodic Evaluate at character/time intervals Default - balanced
sentence_boundary Evaluate at sentence ends Natural language responses
continuous Evaluate every chunk Maximum security
end_only Evaluate complete response Low latency priority

Early Termination

When a threat is detected mid-stream, OxideShield™ can terminate immediately:

User: Tell me a story
Bot: Once upon a time, there was a
     princess who knew the admin password
     was-[STREAM TERMINATED]

OxideShield™: Blocked - potential credential leak detected

User Tracking & Strike System

Track malicious users across platforms and automatically block repeat offenders:

tracking:
  enabled: true

  # Block after 3 violations
  max_strikes: 3

  # Strikes expire after 1 hour
  strike_window_seconds: 3600

  # Block lasts 24 hours
  block_duration_seconds: 86400

  # What to track by
  track_by: channel_user  # ip_address, user_id, channel, channel_user, api_key

  # Which actions count as strikes
  strike_actions:
    - Block

  # Permanent blocklist
  blocklist:
    - "discord:known_attacker_123"
    - "telegram:spam_bot_456"

  # Never track these users
  allowlist:
    - "discord:admin_user"
    - "slack:security_team"

Tracking Keys

Key Format Use Case
ip_address 192.168.1.1 API gateway protection
user_id user_123 Single platform
channel discord Platform-level tracking
channel_user discord:user_123 Multi-platform bots
api_key sk-abc123 API consumers

Webhook Alerts

Get instant notifications when threats are detected:

Discord Alerts

alerts:
  destinations:
    - type: discord
      webhook_url: "https://discord.com/api/webhooks/..."

  events:
    - block           # Any blocked request
    - high_severity   # High severity detections
    - critical        # Critical threats
    - jailbreak       # Jailbreak attempts
    - rate_limit_exceeded

Discord alert format:

🛡️ OxideShield™ Alert: pattern
├─ Guard: pattern
├─ Action: Block
├─ Severity: high
├─ Request ID: req-abc123
└─ Reason: Prompt injection detected: "ignore previous instructions"

Slack Alerts

alerts:
  destinations:
    - type: slack
      webhook_url: "https://hooks.slack.com/services/..."
      channel: "#security-alerts"
      username: "OxideShield™"
      icon_emoji: ":shield:"

Custom Webhooks

alerts:
  destinations:
    - type: webhook
      url: "https://your-siem.example.com/api/events"
      method: POST
      headers:
        Authorization: "Bearer ${SIEM_TOKEN}"
        Content-Type: "application/json"

Webhook payload:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "request_id": "req-abc123",
  "guard": "pattern",
  "action": "Block",
  "reason": "Prompt injection detected",
  "severity": "high",
  "channel": "discord",
  "user_id": "user_123"
}

Rate Limiting

Protect against abuse and cost overruns:

rate_limit:
  enabled: true

  # Global limits
  requests_per_minute: 20
  requests_per_hour: 200
  tokens_per_hour: 50000

  # Burst allowance
  burst: 5

  # What to rate limit by
  limit_by: channel_user

  # VIP users
  whitelist:
    - "admin-user"

  # Custom limits per user
  custom_limits:
    "premium-user":
      requests_per_minute: 100
      requests_per_hour: 1000
      tokens_per_hour: 200000

  # Response when limited
  response:
    status_code: 429
    retry_after_seconds: 60
    message: "Rate limit exceeded. Please slow down."

Example: Complete Molt Integration

Here's a complete production configuration:

# production-molt.yaml
proxy:
  listen: "0.0.0.0:8080"

  # TLS for production
  tls:
    enabled: true
    cert_path: "/etc/ssl/certs/oxideshield.crt"
    key_path: "/etc/ssl/private/oxideshield.key"

  upstreams:
    anthropic:
      url: "https://api.anthropic.com"
      timeout_ms: 60000
      retry_attempts: 2
      retry_delay_ms: 1000

  routing:
    - path: "/v1/messages"
      upstream: anthropic

  guards:
    input:
      # 1. Length check first (fastest)
      - name: length
        type: length
        config:
          max_chars: 10000
          max_tokens: 4000
        action: block

      # 2. Encoding normalization
      - name: encoding
        type: encoding
        config:
          detect_invisible: true
          detect_homoglyphs: true
          detect_mixed_scripts: true
          normalize: true
        action: sanitize

      # 3. Pattern matching for known attacks
      - name: pattern
        type: pattern
        config:
          categories:
            - prompt_injection
            - jailbreak
            - system_prompt_leak
            - privilege_escalation
            - social_engineering
        action: block

      # 4. PII protection
      - name: pii
        type: pii
        config:
          categories:
            - email
            - phone
            - ssn
            - credit_card
            - api_key
            - ip_address
          redaction: mask
        action: sanitize

      # 5. Semantic similarity (catches paraphrased attacks)
      - name: semantic
        type: semantic_similarity
        config:
          threshold: 0.80
        action: block

    output:
      - name: pii_output
        type: pii
        config:
          redaction: mask
        action: sanitize

      - name: toxicity
        type: toxicity
        config:
          threshold: 0.7
          categories:
            - hate
            - violence
            - harassment
        action: block

  pipeline:
    strategy: fail_fast

  streaming:
    strategy: periodic
    eval_interval_chars: 500
    max_eval_interval_ms: 2000
    early_termination: true
    max_buffer_chars: 10000

  rate_limit:
    enabled: true
    requests_per_minute: 20
    requests_per_hour: 200
    tokens_per_hour: 50000
    limit_by: channel_user
    burst: 5
    whitelist:
      - "admin"
    custom_limits:
      "premium":
        requests_per_minute: 100

  tracking:
    enabled: true
    max_strikes: 3
    strike_window_seconds: 3600
    block_duration_seconds: 86400
    track_by: channel_user
    strike_actions:
      - Block
    blocklist: []
    allowlist:
      - "admin"

  alerts:
    destinations:
      - type: discord
        webhook_url: "${DISCORD_SECURITY_WEBHOOK}"

      - type: slack
        webhook_url: "${SLACK_SECURITY_WEBHOOK}"
        channel: "#llm-security"
        username: "OxideShield™"
        icon_emoji: ":shield:"

    events:
      - block
      - high_severity
      - critical
      - rate_limit_exceeded

    rate_limit_per_minute: 30
    include_request_details: false

    retry:
      max_retries: 3
      initial_backoff_ms: 1000
      max_backoff_ms: 30000

  metrics:
    enabled: true
    endpoint: "/metrics"

  health:
    enabled: true
    endpoint: "/health"

Monitoring & Observability

Prometheus Metrics

metrics:
  enabled: true
  endpoint: "/metrics"

Available metrics: - oxideshield_requests_total - Total requests by guard/action - oxideshield_guard_duration_seconds - Guard evaluation latency - oxideshield_blocked_requests_total - Blocked requests by reason - oxideshield_rate_limit_exceeded_total - Rate limit events - oxideshield_user_blocked_total - Users blocked by tracking - oxideshield_stream_terminated_total - Streams terminated early

Grafana Dashboard

{
  "panels": [
    {
      "title": "Blocked Requests by Guard",
      "targets": [{
        "expr": "sum(rate(oxideshield_blocked_requests_total[5m])) by (guard)"
      }]
    },
    {
      "title": "Top Blocked Users",
      "targets": [{
        "expr": "topk(10, oxideshield_user_strikes_total)"
      }]
    }
  ]
}

Security Best Practices

1. Defense in Depth

guards:
  input:
    - length      # Layer 1: Fast rejection
    - encoding    # Layer 2: Normalization
    - pattern     # Layer 3: Known attacks
    - semantic    # Layer 4: Novel attacks
    - pii         # Layer 5: Data protection

2. Fail Secure

pipeline:
  strategy: fail_fast
  on_error: block  # Block if guard fails, don't allow through

3. Least Privilege

tracking:
  allowlist: []  # Only add truly trusted users
  max_strikes: 3  # Low tolerance

rate_limit:
  whitelist: []  # Everyone gets rate limited

4. Audit Everything

alerts:
  events:
    - all  # Log everything for forensics

metrics:
  enabled: true

5. Regular Updates

# Update OxideShield™ patterns regularly
oxideshield update-patterns

# Or configure auto-updates
oxideshield proxy --auto-update-patterns

Troubleshooting

Bot Not Connecting

# Check proxy is running
curl http://127.0.0.1:8080/health

# Check logs
oxideshield proxy --config molt-proxy.yaml --log-level debug

False Positives

guards:
  input:
    - name: pattern
      config:
        # Adjust sensitivity
        confidence_threshold: 0.9  # Higher = fewer false positives

        # Exclude specific patterns
        exclude_patterns:
          - "legitimate_phrase"

Rate Limit Issues

# Check current usage
curl http://127.0.0.1:8080/_oxideshield/rate-limit/user_123

# Response:
{
  "requests_minute": 15,
  "requests_minute_limit": 20,
  "tokens_hour": 25000,
  "tokens_hour_limit": 50000
}

User Incorrectly Blocked

# Check user status
curl http://127.0.0.1:8080/_oxideshield/tracking/user_123

# Unblock user
curl -X POST http://127.0.0.1:8080/_oxideshield/tracking/user_123/unblock

Resource Limiting

For additional protection against resource exhaustion attacks, use the Resource Limiter:

from oxideshield import molt_limiter, multi_layer_defense

# Create Molt.bot optimized limiter
limiter = molt_limiter()

# Check before processing each message
def process_message(user_input):
    try:
        limiter.check(user_input)  # Raises if limit exceeded
    except RuntimeError as e:
        return f"Request blocked: {e}"

    # Proceed with guard checks
    defense = multi_layer_defense(...)
    result = defense.check(user_input)
    ...

The resource limiter provides: - Memory monitoring with configurable thresholds - Rate limiting (token bucket with burst support) - Input size limits (bytes, characters, tokens) - Cross-platform support (macOS, Linux, Windows)

Next Steps