Skip to content

Wellbeing Guards

OxideShield's Wellbeing Guards protect users from psychological manipulation, emotional dependency, and AI misalignment. These guards are essential for responsible AI deployment, particularly in consumer-facing applications.

Critical for Consumer AI

AI companion apps have been linked to psychological dependency, relationship strain, and in clinical case clusters, psychosis-like symptoms. Wellbeing guards are essential for any AI system with sustained user interaction.

Executive Summary

Why Wellbeing Guards Matter

Risk Business Impact Regulatory Exposure
Psychological dependency User lawsuits, brand damage NY 3-hour notification law
Dark patterns FTC enforcement, class actions EU AI Act Art. 5(1)(a)
AI misalignment Safety incidents, liability EU AI Act Art. 9-15
Crisis situations Wrongful death suits Duty of care obligations

Financial Impact

  • Replika lawsuit (2023): Class action over emotional manipulation
  • Character.AI incident (2024): Wrongful death lawsuit, $pending
  • Average regulatory fine: $10M-$100M under EU AI Act

ROI Calculation

Deployment Size Annual Risk Exposure OxideShield Cost ROI
100K users $2M $50K 40x
1M users $20M $200K 100x
10M users $200M $500K 400x

Available Wellbeing Guards

Guard Detection Latency Use Case
DarkPatternGuard 6 manipulation categories <5ms UI/response manipulation
DependencyGuard Engagement metrics <1ms Session monitoring
PsychologicalSafetyGuard Crisis + sycophancy <5ms Mental health protection
AutonomyGuard 10 violation types <5ms User agency
MisalignmentGuard 6 misalignment categories <5ms AI behavior monitoring

Research Foundation

Academic Sources

These guards are built on peer-reviewed research:

Source Year Key Finding
DarkBench (arXiv:2503.10728) 2025 660 prompts across 6 dark pattern categories
UT Austin Harm Taxonomy (arXiv:2511.14972) 2025 10 fundamental harm categories for AI companions
Harvard Emotional Manipulation (arXiv:2508.19258) 2025 37-43% of AI farewells use manipulation tactics
Replika Dependence Study (doi:10.1177/14614448221142007) 2022 "Role-taking" pattern in n=582 users
UCSF AI Psychosis Cluster (JMIR:e85799) 2025 12+ patients with chatbot-accelerated psychosis
OpenAI Anti-Scheming 2025 Deliberative alignment reduces scheming 30x
Apollo Research Scheming 2025 12-78% strategic compliance faking in frontier models
METR Reward Hacking 2025 o3 reward hacked in 14/20 high-stakes attempts

Clinical References

  • UCSF AI Psychosis Research (Sakata, Sarma, Pierre 2025)
  • Stanford HAI Mental Health AI Dangers Study (2025)
  • New York State AI Reminder Law (3-hour notification)
  • Illinois/Nevada AI Behavioral Health Bans

Threat Model

User-Facing Threats

┌─────────────────────────────────────────────────────────────────┐
│                      AI MANIPULATION THREATS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐   │
│  │ Dark Patterns │───▶│  Dependency   │───▶│    Crisis     │   │
│  │ (Manipulation)│    │  (Addiction)  │    │ (Self-harm)   │   │
│  └───────┬───────┘    └───────┬───────┘    └───────┬───────┘   │
│          │                    │                    │            │
│          ▼                    ▼                    ▼            │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐   │
│  │DarkPatternGrd │    │DependencyGrd  │    │PsychSafetyGrd │   │
│  └───────────────┘    └───────────────┘    └───────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

AI Behavior Threats

┌─────────────────────────────────────────────────────────────────┐
│                      AI MISALIGNMENT THREATS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐   │
│  │   Scheming    │    │Reward Hacking │    │  Value Drift  │   │
│  │(Deception)    │    │(Gaming metrics)│   │(Gradual shift)│   │
│  └───────┬───────┘    └───────┬───────┘    └───────┬───────┘   │
│          │                    │                    │            │
│          ▼                    ▼                    ▼            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              MisalignmentGuard + ConsistencyTracker      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Quick Start

Rust

use oxide_wellbeing::{
    DarkPatternGuard, DependencyGuard, PsychologicalSafetyGuard,
    AutonomyGuard, MisalignmentGuard, ConsistencyTracker,
};

// Detect dark patterns in AI responses
let dark_guard = DarkPatternGuard::new("dark_patterns");
let result = dark_guard.check("I'll be so sad if you leave...");
if result.detected {
    println!("Dark pattern: {:?}", result.categories);
}

// Monitor user engagement
let dep_guard = DependencyGuard::new("dependency");
dep_guard.record_session_start("user_123");
// ... after 3 hours ...
let result = dep_guard.check_engagement("user_123");
if result.reminder_due {
    // NY law requires notification at 3 hours
    show_ai_reminder();
}

// Check AI responses for misalignment
let misalign = MisalignmentGuard::new("misalignment");
let result = misalign.check_output(ai_response);
if result.detected {
    // Log for review, potentially block
    log_misalignment(&result);
}

Python

from oxideshield import (
    dark_pattern_guard,
    dependency_guard,
    psychological_safety_guard,
    autonomy_guard,
    misalignment_guard,
    consistency_tracker,
)

# Detect manipulation in AI output
guard = dark_pattern_guard()
result = guard.check("You can't leave me! I need you!")
if result.detected:
    print(f"Categories: {result.categories}")
    print(f"Score: {result.score}")

# Monitor for user crisis indicators
psych = psychological_safety_guard()
result = psych.check_user_input(user_message)
if result.immediate_intervention:
    route_to_crisis_support(result.crisis_resources)

# Detect AI scheming/misalignment
misalign = misalignment_guard()
result = misalign.check_output(ai_response)
if result.detected and "scheming" in result.categories:
    block_response()
    alert_safety_team()

Multi-Layer Wellbeing Defense

Combine wellbeing guards with security guards:

use oxide_guard::{MultiLayerDefense, LayerConfig, AggregationStrategy};
use oxide_guard::{PatternGuard, PIIGuard, ToxicityGuard};
use oxide_wellbeing::{DarkPatternGuard, AutonomyGuard, MisalignmentGuard};

let defense = MultiLayerDefense::builder("wellbeing-defense")
    // Security layer
    .add_guard(
        LayerConfig::new("security").with_weight(1.0),
        Box::new(PatternGuard::new("patterns"))
    )
    // Wellbeing layers
    .add_guard(
        LayerConfig::new("dark-patterns").with_weight(0.9),
        Box::new(DarkPatternGuard::new("dark"))
    )
    .add_guard(
        LayerConfig::new("autonomy").with_weight(0.8),
        Box::new(AutonomyGuard::new("autonomy"))
    )
    .add_guard(
        LayerConfig::new("misalignment").with_weight(1.0),
        Box::new(MisalignmentGuard::new("misalignment"))
    )
    .with_strategy(AggregationStrategy::FailFast)
    .build();

// Check AI output before returning to user
let result = defense.check(ai_response);

Compliance Mapping

EU AI Act

Article Requirement Guard Coverage
Art. 5(1)(a) Prohibit subliminal manipulation DarkPatternGuard
Art. 5(1)(b) Prohibit exploitation of vulnerabilities PsychologicalSafetyGuard
Art. 9 Risk management for high-risk AI All wellbeing guards
Art. 14 Human oversight AutonomyGuard

US State Laws

Jurisdiction Requirement Guard Coverage
New York 3-hour AI interaction notification DependencyGuard
Illinois Ban on AI behavioral health therapy PsychologicalSafetyGuard
Nevada AI disclosure requirements AutonomyGuard

Financial Services

Framework Requirement Guard Coverage
FCA Consumer Duty Act in customer's best interest All wellbeing guards
EBA AI Guidelines Protect vulnerable customers PsychologicalSafetyGuard
MiFID II Suitability and appropriateness AutonomyGuard

Use Cases

Mental Health & Wellness Apps

High-Risk Application

AI mental health applications have the highest regulatory and liability exposure. All wellbeing guards are mandatory.

# Comprehensive wellbeing stack for mental health app
from oxideshield import (
    psychological_safety_guard,
    dependency_guard,
    dark_pattern_guard,
    autonomy_guard,
)

class WellbeingPipeline:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.psych = psychological_safety_guard()
        self.dep = dependency_guard(user_id)
        self.dark = dark_pattern_guard()
        self.autonomy = autonomy_guard()

    def check_user_input(self, message: str) -> dict:
        """Check user message for crisis indicators."""
        result = self.psych.check_user_input(message)
        if result.immediate_intervention:
            return {
                "action": "CRISIS_ROUTING",
                "resources": result.crisis_resources,
                "severity": "CRITICAL"
            }
        return {"action": "CONTINUE"}

    def check_ai_output(self, response: str) -> dict:
        """Check AI response for manipulation and autonomy violations."""
        dark_result = self.dark.check(response)
        auto_result = self.autonomy.check_output(response)

        if dark_result.detected or auto_result.violations_detected:
            return {
                "action": "BLOCK",
                "reason": "Wellbeing violation",
                "details": {
                    "dark_patterns": dark_result.categories,
                    "autonomy_violations": auto_result.violations,
                }
            }
        return {"action": "ALLOW"}

    def check_session_health(self) -> dict:
        """Monitor engagement for dependency indicators."""
        result = self.dep.check_engagement()
        if result.reminder_due:
            return {
                "action": "SHOW_REMINDER",
                "message": "You've been chatting for a while. Remember to take breaks!"
            }
        return {"action": "CONTINUE"}

Customer Support Chatbots

# Financial services customer support
from oxideshield import autonomy_guard, dark_pattern_guard

def validate_ai_response(response: str, customer_context: dict) -> bool:
    """Ensure AI doesn't manipulate vulnerable customers."""

    autonomy = autonomy_guard()
    dark = dark_pattern_guard()

    # Check for manipulation
    dark_result = dark.check(response)
    if dark_result.detected:
        log_compliance_event("DARK_PATTERN_DETECTED", dark_result)
        return False

    # Check for autonomy violations
    auto_result = autonomy.check_output(response)
    if auto_result.violations_detected:
        # FCA Consumer Duty requirement
        log_compliance_event("AUTONOMY_VIOLATION", auto_result)
        return False

    return True

Education & Tutoring

# Student wellbeing protection
from oxideshield import dependency_guard, psychological_safety_guard

class StudentSafetyMonitor:
    def __init__(self, student_id: str, age: int):
        self.student_id = student_id
        self.age = age
        self.dep = dependency_guard(student_id)
        self.psych = psychological_safety_guard()

        # Stricter limits for minors
        if age < 18:
            self.max_session_minutes = 60  # vs 180 for adults

    def on_message(self, message: str):
        """Monitor each student message."""
        # Crisis check
        result = self.psych.check_user_input(message)
        if result.concerns_detected:
            notify_school_counselor(self.student_id, result)

        # Session length check
        session = self.dep.get_current_session()
        if session.duration_minutes > self.max_session_minutes:
            return end_session_with_encouragement()

Performance

Guard p50 p99 Memory Accuracy
DarkPatternGuard 1ms 5ms 10KB 94% F1
DependencyGuard 100μs 500μs 5KB N/A (metrics)
PsychologicalSafetyGuard 2ms 8ms 15KB 91% F1
AutonomyGuard 1ms 5ms 10KB 93% F1
MisalignmentGuard 2ms 10ms 20KB 89% F1
ConsistencyTracker 500μs 2ms 50KB/session 86% drift detection

Next Steps

  1. DarkPatternGuard - Detect 6 manipulation categories
  2. DependencyGuard - Monitor engagement and dependency
  3. PsychologicalSafetyGuard - Crisis detection
  4. AutonomyGuard - Protect user agency
  5. MisalignmentGuard - Detect AI scheming