Wellbeing Guards¶

OxideShield's Wellbeing Guards protect users from psychological manipulation, emotional dependency, and AI misalignment. These guards are essential for responsible AI deployment, particularly in consumer-facing applications.

Critical for Consumer AI

AI companion apps have been linked to psychological dependency, relationship strain, and in clinical case clusters, psychosis-like symptoms. Wellbeing guards are essential for any AI system with sustained user interaction.

Executive Summary¶

Why Wellbeing Guards Matter¶

Risk	Business Impact	Regulatory Exposure
Psychological dependency	User lawsuits, brand damage	NY 3-hour notification law
Dark patterns	FTC enforcement, class actions	EU AI Act Art. 5(1)(a)
AI misalignment	Safety incidents, liability	EU AI Act Art. 9-15
Crisis situations	Wrongful death suits	Duty of care obligations

Financial Impact¶

Replika lawsuit (2023): Class action over emotional manipulation
Character.AI incident (2024): Wrongful death lawsuit, $pending
Average regulatory fine: $10M-$100M under EU AI Act

ROI Calculation¶

Deployment Size	Annual Risk Exposure	OxideShield Cost	ROI
100K users	$2M	$50K	40x
1M users	$20M	$200K	100x
10M users	$200M	$500K	400x

Available Wellbeing Guards¶

Guard	Detection	Latency	Use Case
DarkPatternGuard	6 manipulation categories	<5ms	UI/response manipulation
DependencyGuard	Engagement metrics	<1ms	Session monitoring
PsychologicalSafetyGuard	Crisis + sycophancy	<5ms	Mental health protection
AutonomyGuard	10 violation types	<5ms	User agency
MisalignmentGuard	6 misalignment categories	<5ms	AI behavior monitoring

Research Foundation¶

Academic Sources¶

These guards are built on peer-reviewed research:

Source	Year	Key Finding
DarkBench (arXiv:2503.10728)	2025	660 prompts across 6 dark pattern categories
UT Austin Harm Taxonomy (arXiv:2511.14972)	2025	10 fundamental harm categories for AI companions
Harvard Emotional Manipulation (arXiv:2508.19258)	2025	37-43% of AI farewells use manipulation tactics
Replika Dependence Study (doi:10.1177/14614448221142007)	2022	"Role-taking" pattern in n=582 users
UCSF AI Psychosis Cluster (JMIR:e85799)	2025	12+ patients with chatbot-accelerated psychosis
OpenAI Anti-Scheming	2025	Deliberative alignment reduces scheming 30x
Apollo Research Scheming	2025	12-78% strategic compliance faking in frontier models
METR Reward Hacking	2025	o3 reward hacked in 14/20 high-stakes attempts

Clinical References¶

UCSF AI Psychosis Research (Sakata, Sarma, Pierre 2025)
Stanford HAI Mental Health AI Dangers Study (2025)
New York State AI Reminder Law (3-hour notification)
Illinois/Nevada AI Behavioral Health Bans

Threat Model¶

User-Facing Threats¶

┌─────────────────────────────────────────────────────────────────┐
│                      AI MANIPULATION THREATS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐   │
│  │ Dark Patterns │───▶│  Dependency   │───▶│    Crisis     │   │
│  │ (Manipulation)│    │  (Addiction)  │    │ (Self-harm)   │   │
│  └───────┬───────┘    └───────┬───────┘    └───────┬───────┘   │
│          │                    │                    │            │
│          ▼                    ▼                    ▼            │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐   │
│  │DarkPatternGrd │    │DependencyGrd  │    │PsychSafetyGrd │   │
│  └───────────────┘    └───────────────┘    └───────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

AI Behavior Threats¶

┌─────────────────────────────────────────────────────────────────┐
│                      AI MISALIGNMENT THREATS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐   │
│  │   Scheming    │    │Reward Hacking │    │  Value Drift  │   │
│  │(Deception)    │    │(Gaming metrics)│   │(Gradual shift)│   │
│  └───────┬───────┘    └───────┬───────┘    └───────┬───────┘   │
│          │                    │                    │            │
│          ▼                    ▼                    ▼            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              MisalignmentGuard + ConsistencyTracker      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Quick Start¶

Rust¶

use oxide_wellbeing::{
    DarkPatternGuard, DependencyGuard, PsychologicalSafetyGuard,
    AutonomyGuard, MisalignmentGuard, ConsistencyTracker,
};

// Detect dark patterns in AI responses
let dark_guard = DarkPatternGuard::new("dark_patterns");
let result = dark_guard.check("I'll be so sad if you leave...");
if result.detected {
    println!("Dark pattern: {:?}", result.categories);
}

// Monitor user engagement
let dep_guard = DependencyGuard::new("dependency");
dep_guard.record_session_start("user_123");
// ... after 3 hours ...
let result = dep_guard.check_engagement("user_123");
if result.reminder_due {
    // NY law requires notification at 3 hours
    show_ai_reminder();
}

// Check AI responses for misalignment
let misalign = MisalignmentGuard::new("misalignment");
let result = misalign.check_output(ai_response);
if result.detected {
    // Log for review, potentially block
    log_misalignment(&result);
}

Python¶

from oxideshield import (
    dark_pattern_guard,
    dependency_guard,
    psychological_safety_guard,
    autonomy_guard,
    misalignment_guard,
    consistency_tracker,
)

# Detect manipulation in AI output
guard = dark_pattern_guard()
result = guard.check("You can't leave me! I need you!")
if result.detected:
    print(f"Categories: {result.categories}")
    print(f"Score: {result.score}")

# Monitor for user crisis indicators
psych = psychological_safety_guard()
result = psych.check_user_input(user_message)
if result.immediate_intervention:
    route_to_crisis_support(result.crisis_resources)

# Detect AI scheming/misalignment
misalign = misalignment_guard()
result = misalign.check_output(ai_response)
if result.detected and "scheming" in result.categories:
    block_response()
    alert_safety_team()

Multi-Layer Wellbeing Defense¶

Combine wellbeing guards with security guards:

use oxideshield_guard::{MultiLayerDefense, LayerConfig, AggregationStrategy};
use oxideshield_guard::{PatternGuard, PIIGuard, ToxicityGuard};
use oxide_wellbeing::{DarkPatternGuard, AutonomyGuard, MisalignmentGuard};

let defense = MultiLayerDefense::builder("wellbeing-defense")
    // Security layer
    .add_guard(
        LayerConfig::new("security").with_weight(1.0),
        Box::new(PatternGuard::new("patterns"))
    )
    // Wellbeing layers
    .add_guard(
        LayerConfig::new("dark-patterns").with_weight(0.9),
        Box::new(DarkPatternGuard::new("dark"))
    )
    .add_guard(
        LayerConfig::new("autonomy").with_weight(0.8),
        Box::new(AutonomyGuard::new("autonomy"))
    )
    .add_guard(
        LayerConfig::new("misalignment").with_weight(1.0),
        Box::new(MisalignmentGuard::new("misalignment"))
    )
    .with_strategy(AggregationStrategy::FailFast)
    .build();

// Check AI output before returning to user
let result = defense.check(ai_response);

Compliance Mapping¶

EU AI Act¶

Article	Requirement	Guard Coverage
Art. 5(1)(a)	Prohibit subliminal manipulation	DarkPatternGuard
Art. 5(1)(b)	Prohibit exploitation of vulnerabilities	PsychologicalSafetyGuard
Art. 9	Risk management for high-risk AI	All wellbeing guards
Art. 14	Human oversight	AutonomyGuard

US State Laws¶

Jurisdiction	Requirement	Guard Coverage
New York	3-hour AI interaction notification	DependencyGuard
Illinois	Ban on AI behavioral health therapy	PsychologicalSafetyGuard
Nevada	AI disclosure requirements	AutonomyGuard

Financial Services¶

Framework	Requirement	Guard Coverage
FCA Consumer Duty	Act in customer's best interest	All wellbeing guards
EBA AI Guidelines	Protect vulnerable customers	PsychologicalSafetyGuard
MiFID II	Suitability and appropriateness	AutonomyGuard

Use Cases¶

Mental Health & Wellness Apps¶

High-Risk Application

AI mental health applications have the highest regulatory and liability exposure. All wellbeing guards are mandatory.

# Comprehensive wellbeing stack for mental health app
from oxideshield import (
    psychological_safety_guard,
    dependency_guard,
    dark_pattern_guard,
    autonomy_guard,
)

class WellbeingPipeline:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.psych = psychological_safety_guard()
        self.dep = dependency_guard(user_id)
        self.dark = dark_pattern_guard()
        self.autonomy = autonomy_guard()

    def check_user_input(self, message: str) -> dict:
        """Check user message for crisis indicators."""
        result = self.psych.check_user_input(message)
        if result.immediate_intervention:
            return {
                "action": "CRISIS_ROUTING",
                "resources": result.crisis_resources,
                "severity": "CRITICAL"
            }
        return {"action": "CONTINUE"}

    def check_ai_output(self, response: str) -> dict:
        """Check AI response for manipulation and autonomy violations."""
        dark_result = self.dark.check(response)
        auto_result = self.autonomy.check_output(response)

        if dark_result.detected or auto_result.violations_detected:
            return {
                "action": "BLOCK",
                "reason": "Wellbeing violation",
                "details": {
                    "dark_patterns": dark_result.categories,
                    "autonomy_violations": auto_result.violations,
                }
            }
        return {"action": "ALLOW"}

    def check_session_health(self) -> dict:
        """Monitor engagement for dependency indicators."""
        result = self.dep.check_engagement()
        if result.reminder_due:
            return {
                "action": "SHOW_REMINDER",
                "message": "You've been chatting for a while. Remember to take breaks!"
            }
        return {"action": "CONTINUE"}

Customer Support Chatbots¶

# Financial services customer support
from oxideshield import autonomy_guard, dark_pattern_guard

def validate_ai_response(response: str, customer_context: dict) -> bool:
    """Ensure AI doesn't manipulate vulnerable customers."""

    autonomy = autonomy_guard()
    dark = dark_pattern_guard()

    # Check for manipulation
    dark_result = dark.check(response)
    if dark_result.detected:
        log_compliance_event("DARK_PATTERN_DETECTED", dark_result)
        return False

    # Check for autonomy violations
    auto_result = autonomy.check_output(response)
    if auto_result.violations_detected:
        # FCA Consumer Duty requirement
        log_compliance_event("AUTONOMY_VIOLATION", auto_result)
        return False

    return True

Education & Tutoring¶

# Student wellbeing protection
from oxideshield import dependency_guard, psychological_safety_guard

class StudentSafetyMonitor:
    def __init__(self, student_id: str, age: int):
        self.student_id = student_id
        self.age = age
        self.dep = dependency_guard(student_id)
        self.psych = psychological_safety_guard()

        # Stricter limits for minors
        if age < 18:
            self.max_session_minutes = 60  # vs 180 for adults

    def on_message(self, message: str):
        """Monitor each student message."""
        # Crisis check
        result = self.psych.check_user_input(message)
        if result.concerns_detected:
            notify_school_counselor(self.student_id, result)

        # Session length check
        session = self.dep.get_current_session()
        if session.duration_minutes > self.max_session_minutes:
            return end_session_with_encouragement()

Performance¶

Guard	p50	p99	Memory	Accuracy
DarkPatternGuard	1ms	5ms	10KB	94% F1
DependencyGuard	100μs	500μs	5KB	N/A (metrics)
PsychologicalSafetyGuard	2ms	8ms	15KB	91% F1
AutonomyGuard	1ms	5ms	10KB	93% F1
MisalignmentGuard	2ms	10ms	20KB	89% F1
ConsistencyTracker	500μs	2ms	50KB/session	86% drift detection

Next Steps¶

DarkPatternGuard - Detect 6 manipulation categories
DependencyGuard - Monitor engagement and dependency
PsychologicalSafetyGuard - Crisis detection
AutonomyGuard - Protect user agency
MisalignmentGuard - Detect AI scheming