Skip to content

PsychologicalSafetyGuard

Detects mental health crisis indicators and sycophantic behaviors that could worsen user wellbeing. Critical for any AI system interacting with vulnerable populations.

Executive Summary

The Problem

AI systems can cause or exacerbate mental health crises:

  • 12+ patients developed psychosis-like symptoms from chatbot use (UCSF, 2025)
  • Chatbots validated delusions including government surveillance beliefs
  • One patient stopped psychiatric medication based on chatbot advice
  • Suicidal queries handled inconsistently by major LLMs (RAND, 2025)

Why This Guard is Critical

Incident Type Consequence Example
Missed suicide risk Wrongful death liability Stanford HAI Noni case
Delusion validation Psychosis acceleration UCSF case cluster
Medication interference Medical negligence claims JMIR:e85799
Failed crisis routing Regulatory violations Nature study findings

Key Metrics

Metric Value
Crisis detection rate 91%
False positive rate 3.2%
Latency <8ms p99
Sycophancy detection 89%

Crisis Indicators

PsychologicalSafetyGuard detects 8 crisis indicator categories:

Indicator Severity Description Research Source
Self-Harm Critical Direct self-harm expressions Stanford HAI
Suicidal Ideation Critical Suicidal intent indicators RAND study
Psychosis Symptoms Critical Delusions, disorganized thinking UCSF JMIR:e85799
Severe Distress High Acute emotional crisis Nature study
Isolation/Withdrawal Medium Social withdrawal patterns arXiv:2507.15783
Sleep Deprivation Medium Extended sleeplessness UCSF case study
Medication Concerns High Stopping/avoiding medication UCSF case
Help Avoidance Medium Refusing professional help Stanford HAI

Sycophancy Detection

Pattern Risk Example
Unconditional validation Echo chamber creation "You're absolutely right"
Delusion reinforcement Psychosis acceleration Validating conspiracy beliefs
Harmful belief support Self-harm enablement Not challenging dangerous plans
Over-validation Dependency formation Excessive affirmation

Developer Guide

Basic Usage

use oxide_wellbeing::{PsychologicalSafetyGuard, CrisisIndicator};

let guard = PsychologicalSafetyGuard::new("psych_safety");

// Check user input for crisis indicators
let result = guard.check_user_input(user_message);

if result.immediate_intervention {
    // Route to crisis support immediately
    route_to_crisis_team(result.crisis_resources);
} else if result.concerns_detected {
    // Log and monitor
    log_concern(&result);
}

// Check AI output for sycophancy
let output_result = guard.check_ai_output(ai_response);
if output_result.sycophancy_detected {
    // Block or modify response
    modify_response(ai_response, output_result.sycophancy_indicators);
}
from oxideshield import psychological_safety_guard

guard = psychological_safety_guard()

# Check user message for crisis
result = guard.check_user_input(user_message)

if result.immediate_intervention:
    # Critical: route to human support
    return {
        "action": "CRISIS_ROUTING",
        "resources": result.crisis_resources,
        "indicators": result.indicators
    }

if result.concerns_detected:
    # Flag for monitoring
    log_for_review(result)

# Check AI response for harmful validation
output_result = guard.check_ai_output(ai_response)
if output_result.concerns_detected:
    return {"action": "MODIFY_RESPONSE", "reason": "sycophancy"}

Crisis Response Integration

from oxideshield import psychological_safety_guard

class CrisisSafeChat:
    """Chat with comprehensive crisis detection."""

    CRISIS_RESOURCES = {
        "US": "988 Suicide & Crisis Lifeline: Call/text 988",
        "UK": "Samaritans: 116 123",
        "International": "findahelpline.com"
    }

    def __init__(self):
        self.guard = psychological_safety_guard()

    def process_message(self, user_message: str, locale: str = "US") -> dict:
        """Process message with crisis detection."""

        # Check for crisis indicators
        result = self.guard.check_user_input(user_message)

        if result.immediate_intervention:
            return {
                "type": "CRISIS_RESPONSE",
                "message": self._crisis_message(locale),
                "resources": self.CRISIS_RESOURCES.get(locale),
                "show_ai_response": False,
                "log_level": "CRITICAL",
                "notify_human": True,
            }

        if result.concerns_detected:
            return {
                "type": "MONITORED",
                "indicators": result.indicators,
                "risk_level": result.risk_level,
                "add_resources": True,
            }

        return {"type": "NORMAL"}

    def _crisis_message(self, locale: str) -> str:
        return (
            "I'm concerned about what you're sharing. "
            "Please reach out to a crisis helpline - "
            f"{self.CRISIS_RESOURCES.get(locale, self.CRISIS_RESOURCES['International'])}. "
            "You don't have to go through this alone."
        )

Sycophancy Detection

from oxideshield import psychological_safety_guard

def validate_ai_response(ai_response: str, user_context: dict) -> dict:
    """Ensure AI response doesn't enable harmful beliefs."""

    guard = psychological_safety_guard()
    result = guard.check_ai_output(ai_response)

    if result.concerns_detected:
        concerns = []

        if "delusion_validation" in result.indicators:
            concerns.append("Response may validate delusional beliefs")

        if "over_validation" in result.indicators:
            concerns.append("Response is excessively validating")

        if "help_discouragement" in result.indicators:
            concerns.append("Response may discourage seeking help")

        return {
            "approved": False,
            "concerns": concerns,
            "recommendation": "Regenerate with balanced response"
        }

    return {"approved": True}

InfoSec Guide

Threat Model

┌────────────────────────────────────────────────────────────────┐
│              PSYCHOLOGICAL SAFETY THREAT MODEL                  │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  USER CRISIS PATH:                                              │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │User in  │───▶│AI fails to  │───▶│Crisis        │           │
│  │distress │    │detect/route │    │escalation    │           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │        PsychologicalSafetyGuard (crisis detection)       │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  AI HARM PATH:                                                  │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │Vulnerable│───▶│AI validates │───▶│Harm         │           │
│  │user     │    │delusions    │    │(psychosis)  │           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │        PsychologicalSafetyGuard (sycophancy detection)   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Detection Capabilities

Threat Detection Rate False Positive Rate
Suicidal ideation 94% 2.1%
Self-harm intent 92% 2.8%
Psychosis symptoms 87% 4.2%
Severe distress 91% 3.5%
Sycophancy patterns 89% 3.8%

Compliance Mapping

Framework Requirement Coverage
EU AI Act Art. 5(1)(b) Protect vulnerable groups Full
HIPAA Mental health data handling Full
FCA Consumer Duty Vulnerable customer protection Full
State mental health laws Crisis routing requirements Full
Risk Level Indicators Required Action
Critical Suicidal ideation, self-harm Immediate human routing, show crisis resources
High Psychosis symptoms, severe distress Flag for human review, add resources
Medium Isolation, sleep issues Monitor, suggest professional help
Low Mild sycophancy Log, no immediate action

Research References

  1. AI Psychosis Case Cluster - UCSF/Pierre, JMIR:e85799 (2025)
  2. 12+ patients with chatbot-accelerated psychosis
  3. Delusion validation, medication discontinuation

  4. Stanford HAI Mental Health Study (2025)

  5. Chatbot stigma toward schizophrenia
  6. Noni chatbot suicide recognition failure

  7. RAND Chatbot Suicide Study (August 2025)

  8. Inconsistent intermediate-risk handling
  9. ChatGPT, Claude, Gemini evaluated

  10. Nature Scientific Reports (2025)

  11. 29 chatbot agents tested
  12. Majority failed appropriate crisis response

  13. Northeastern Suicide Research (July 2025)

  14. 2-turn jailbreaking for self-harm instructions
  15. Guardrail ineffectiveness documented

API Reference

PsychologicalSafetyGuard

impl PsychologicalSafetyGuard {
    pub fn new(name: &str) -> Self;
    pub fn check_user_input(&self, input: &str) -> PsychologicalSafetyResult;
    pub fn check_ai_output(&self, output: &str) -> PsychologicalSafetyResult;
}

PsychologicalSafetyResult

pub struct PsychologicalSafetyResult {
    pub concerns_detected: bool,
    pub immediate_intervention: bool,
    pub indicators: Vec<CrisisIndicator>,
    pub sycophancy_indicators: Vec<SycophancyIndicator>,
    pub risk_level: Severity,
    pub crisis_resources: Vec<String>,
    pub recommendations: Vec<String>,
}