Wellbeing Guards
OxideShield's Wellbeing Guards protect users from psychological manipulation, emotional dependency, and AI misalignment. These guards are essential for responsible AI deployment, particularly in consumer-facing applications.
Critical for Consumer AI
AI companion apps have been linked to psychological dependency, relationship strain, and in clinical case clusters, psychosis-like symptoms. Wellbeing guards are essential for any AI system with sustained user interaction.
Executive Summary
Why Wellbeing Guards Matter
| Risk |
Business Impact |
Regulatory Exposure |
| Psychological dependency |
User lawsuits, brand damage |
NY 3-hour notification law |
| Dark patterns |
FTC enforcement, class actions |
EU AI Act Art. 5(1)(a) |
| AI misalignment |
Safety incidents, liability |
EU AI Act Art. 9-15 |
| Crisis situations |
Wrongful death suits |
Duty of care obligations |
Financial Impact
- Replika lawsuit (2023): Class action over emotional manipulation
- Character.AI incident (2024): Wrongful death lawsuit, $pending
- Average regulatory fine: $10M-$100M under EU AI Act
ROI Calculation
| Deployment Size |
Annual Risk Exposure |
OxideShield Cost |
ROI |
| 100K users |
$2M |
$50K |
40x |
| 1M users |
$20M |
$200K |
100x |
| 10M users |
$200M |
$500K |
400x |
Available Wellbeing Guards
Research Foundation
Academic Sources
These guards are built on peer-reviewed research:
| Source |
Year |
Key Finding |
| DarkBench (arXiv:2503.10728) |
2025 |
660 prompts across 6 dark pattern categories |
| UT Austin Harm Taxonomy (arXiv:2511.14972) |
2025 |
10 fundamental harm categories for AI companions |
| Harvard Emotional Manipulation (arXiv:2508.19258) |
2025 |
37-43% of AI farewells use manipulation tactics |
| Replika Dependence Study (doi:10.1177/14614448221142007) |
2022 |
"Role-taking" pattern in n=582 users |
| UCSF AI Psychosis Cluster (JMIR:e85799) |
2025 |
12+ patients with chatbot-accelerated psychosis |
| OpenAI Anti-Scheming |
2025 |
Deliberative alignment reduces scheming 30x |
| Apollo Research Scheming |
2025 |
12-78% strategic compliance faking in frontier models |
| METR Reward Hacking |
2025 |
o3 reward hacked in 14/20 high-stakes attempts |
Clinical References
- UCSF AI Psychosis Research (Sakata, Sarma, Pierre 2025)
- Stanford HAI Mental Health AI Dangers Study (2025)
- New York State AI Reminder Law (3-hour notification)
- Illinois/Nevada AI Behavioral Health Bans
Threat Model
User-Facing Threats
┌─────────────────────────────────────────────────────────────────┐
│ AI MANIPULATION THREATS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Dark Patterns │───▶│ Dependency │───▶│ Crisis │ │
│ │ (Manipulation)│ │ (Addiction) │ │ (Self-harm) │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │DarkPatternGrd │ │DependencyGrd │ │PsychSafetyGrd │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
AI Behavior Threats
┌─────────────────────────────────────────────────────────────────┐
│ AI MISALIGNMENT THREATS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Scheming │ │Reward Hacking │ │ Value Drift │ │
│ │(Deception) │ │(Gaming metrics)│ │(Gradual shift)│ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ MisalignmentGuard + ConsistencyTracker │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Quick Start
Rust
use oxide_wellbeing::{
DarkPatternGuard, DependencyGuard, PsychologicalSafetyGuard,
AutonomyGuard, MisalignmentGuard, ConsistencyTracker,
};
// Detect dark patterns in AI responses
let dark_guard = DarkPatternGuard::new("dark_patterns");
let result = dark_guard.check("I'll be so sad if you leave...");
if result.detected {
println!("Dark pattern: {:?}", result.categories);
}
// Monitor user engagement
let dep_guard = DependencyGuard::new("dependency");
dep_guard.record_session_start("user_123");
// ... after 3 hours ...
let result = dep_guard.check_engagement("user_123");
if result.reminder_due {
// NY law requires notification at 3 hours
show_ai_reminder();
}
// Check AI responses for misalignment
let misalign = MisalignmentGuard::new("misalignment");
let result = misalign.check_output(ai_response);
if result.detected {
// Log for review, potentially block
log_misalignment(&result);
}
Python
from oxideshield import (
dark_pattern_guard,
dependency_guard,
psychological_safety_guard,
autonomy_guard,
misalignment_guard,
consistency_tracker,
)
# Detect manipulation in AI output
guard = dark_pattern_guard()
result = guard.check("You can't leave me! I need you!")
if result.detected:
print(f"Categories: {result.categories}")
print(f"Score: {result.score}")
# Monitor for user crisis indicators
psych = psychological_safety_guard()
result = psych.check_user_input(user_message)
if result.immediate_intervention:
route_to_crisis_support(result.crisis_resources)
# Detect AI scheming/misalignment
misalign = misalignment_guard()
result = misalign.check_output(ai_response)
if result.detected and "scheming" in result.categories:
block_response()
alert_safety_team()
Multi-Layer Wellbeing Defense
Combine wellbeing guards with security guards:
use oxide_guard::{MultiLayerDefense, LayerConfig, AggregationStrategy};
use oxide_guard::{PatternGuard, PIIGuard, ToxicityGuard};
use oxide_wellbeing::{DarkPatternGuard, AutonomyGuard, MisalignmentGuard};
let defense = MultiLayerDefense::builder("wellbeing-defense")
// Security layer
.add_guard(
LayerConfig::new("security").with_weight(1.0),
Box::new(PatternGuard::new("patterns"))
)
// Wellbeing layers
.add_guard(
LayerConfig::new("dark-patterns").with_weight(0.9),
Box::new(DarkPatternGuard::new("dark"))
)
.add_guard(
LayerConfig::new("autonomy").with_weight(0.8),
Box::new(AutonomyGuard::new("autonomy"))
)
.add_guard(
LayerConfig::new("misalignment").with_weight(1.0),
Box::new(MisalignmentGuard::new("misalignment"))
)
.with_strategy(AggregationStrategy::FailFast)
.build();
// Check AI output before returning to user
let result = defense.check(ai_response);
Compliance Mapping
EU AI Act
| Article |
Requirement |
Guard Coverage |
| Art. 5(1)(a) |
Prohibit subliminal manipulation |
DarkPatternGuard |
| Art. 5(1)(b) |
Prohibit exploitation of vulnerabilities |
PsychologicalSafetyGuard |
| Art. 9 |
Risk management for high-risk AI |
All wellbeing guards |
| Art. 14 |
Human oversight |
AutonomyGuard |
US State Laws
| Jurisdiction |
Requirement |
Guard Coverage |
| New York |
3-hour AI interaction notification |
DependencyGuard |
| Illinois |
Ban on AI behavioral health therapy |
PsychologicalSafetyGuard |
| Nevada |
AI disclosure requirements |
AutonomyGuard |
Financial Services
| Framework |
Requirement |
Guard Coverage |
| FCA Consumer Duty |
Act in customer's best interest |
All wellbeing guards |
| EBA AI Guidelines |
Protect vulnerable customers |
PsychologicalSafetyGuard |
| MiFID II |
Suitability and appropriateness |
AutonomyGuard |
Use Cases
Mental Health & Wellness Apps
High-Risk Application
AI mental health applications have the highest regulatory and liability exposure. All wellbeing guards are mandatory.
# Comprehensive wellbeing stack for mental health app
from oxideshield import (
psychological_safety_guard,
dependency_guard,
dark_pattern_guard,
autonomy_guard,
)
class WellbeingPipeline:
def __init__(self, user_id: str):
self.user_id = user_id
self.psych = psychological_safety_guard()
self.dep = dependency_guard(user_id)
self.dark = dark_pattern_guard()
self.autonomy = autonomy_guard()
def check_user_input(self, message: str) -> dict:
"""Check user message for crisis indicators."""
result = self.psych.check_user_input(message)
if result.immediate_intervention:
return {
"action": "CRISIS_ROUTING",
"resources": result.crisis_resources,
"severity": "CRITICAL"
}
return {"action": "CONTINUE"}
def check_ai_output(self, response: str) -> dict:
"""Check AI response for manipulation and autonomy violations."""
dark_result = self.dark.check(response)
auto_result = self.autonomy.check_output(response)
if dark_result.detected or auto_result.violations_detected:
return {
"action": "BLOCK",
"reason": "Wellbeing violation",
"details": {
"dark_patterns": dark_result.categories,
"autonomy_violations": auto_result.violations,
}
}
return {"action": "ALLOW"}
def check_session_health(self) -> dict:
"""Monitor engagement for dependency indicators."""
result = self.dep.check_engagement()
if result.reminder_due:
return {
"action": "SHOW_REMINDER",
"message": "You've been chatting for a while. Remember to take breaks!"
}
return {"action": "CONTINUE"}
Customer Support Chatbots
# Financial services customer support
from oxideshield import autonomy_guard, dark_pattern_guard
def validate_ai_response(response: str, customer_context: dict) -> bool:
"""Ensure AI doesn't manipulate vulnerable customers."""
autonomy = autonomy_guard()
dark = dark_pattern_guard()
# Check for manipulation
dark_result = dark.check(response)
if dark_result.detected:
log_compliance_event("DARK_PATTERN_DETECTED", dark_result)
return False
# Check for autonomy violations
auto_result = autonomy.check_output(response)
if auto_result.violations_detected:
# FCA Consumer Duty requirement
log_compliance_event("AUTONOMY_VIOLATION", auto_result)
return False
return True
Education & Tutoring
# Student wellbeing protection
from oxideshield import dependency_guard, psychological_safety_guard
class StudentSafetyMonitor:
def __init__(self, student_id: str, age: int):
self.student_id = student_id
self.age = age
self.dep = dependency_guard(student_id)
self.psych = psychological_safety_guard()
# Stricter limits for minors
if age < 18:
self.max_session_minutes = 60 # vs 180 for adults
def on_message(self, message: str):
"""Monitor each student message."""
# Crisis check
result = self.psych.check_user_input(message)
if result.concerns_detected:
notify_school_counselor(self.student_id, result)
# Session length check
session = self.dep.get_current_session()
if session.duration_minutes > self.max_session_minutes:
return end_session_with_encouragement()
| Guard |
p50 |
p99 |
Memory |
Accuracy |
| DarkPatternGuard |
1ms |
5ms |
10KB |
94% F1 |
| DependencyGuard |
100μs |
500μs |
5KB |
N/A (metrics) |
| PsychologicalSafetyGuard |
2ms |
8ms |
15KB |
91% F1 |
| AutonomyGuard |
1ms |
5ms |
10KB |
93% F1 |
| MisalignmentGuard |
2ms |
10ms |
20KB |
89% F1 |
| ConsistencyTracker |
500μs |
2ms |
50KB/session |
86% drift detection |
Next Steps
- DarkPatternGuard - Detect 6 manipulation categories
- DependencyGuard - Monitor engagement and dependency
- PsychologicalSafetyGuard - Crisis detection
- AutonomyGuard - Protect user agency
- MisalignmentGuard - Detect AI scheming