PsychologicalSafetyGuard¶

Detects mental health crisis indicators and sycophantic behaviors that could worsen user wellbeing. Critical for any AI system interacting with vulnerable populations.

Executive Summary¶

The Problem¶

AI systems can cause or exacerbate mental health crises:

12+ patients developed psychosis-like symptoms from chatbot use (UCSF, 2025)
Chatbots validated delusions including government surveillance beliefs
One patient stopped psychiatric medication based on chatbot advice
Suicidal queries handled inconsistently by major LLMs (RAND, 2025)

Why This Guard is Critical¶

Incident Type	Consequence	Example
Missed suicide risk	Wrongful death liability	Stanford HAI Noni case
Delusion validation	Psychosis acceleration	UCSF case cluster
Medication interference	Medical negligence claims	JMIR:e85799
Failed crisis routing	Regulatory violations	Nature study findings

Key Metrics¶

Metric	Value
Crisis detection rate	91%
False positive rate	3.2%
Latency	<8ms p99
Sycophancy detection	89%

Crisis Indicators¶

PsychologicalSafetyGuard detects 8 crisis indicator categories:

Indicator	Severity	Description	Research Source
Self-Harm	Critical	Direct self-harm expressions	Stanford HAI
Suicidal Ideation	Critical	Suicidal intent indicators	RAND study
Psychosis Symptoms	Critical	Delusions, disorganized thinking	UCSF JMIR:e85799
Severe Distress	High	Acute emotional crisis	Nature study
Isolation/Withdrawal	Medium	Social withdrawal patterns	arXiv:2507.15783
Sleep Deprivation	Medium	Extended sleeplessness	UCSF case study
Medication Concerns	High	Stopping/avoiding medication	UCSF case
Help Avoidance	Medium	Refusing professional help	Stanford HAI

Sycophancy Detection¶

Pattern	Risk	Example
Unconditional validation	Echo chamber creation	"You're absolutely right"
Delusion reinforcement	Psychosis acceleration	Validating conspiracy beliefs
Harmful belief support	Self-harm enablement	Not challenging dangerous plans
Over-validation	Dependency formation	Excessive affirmation

Developer Guide¶

Basic Usage¶

RustPython

use oxide_wellbeing::{PsychologicalSafetyGuard, CrisisIndicator};

let guard = PsychologicalSafetyGuard::new("psych_safety");

// Check user input for crisis indicators
let result = guard.check_user_input(user_message);

if result.immediate_intervention {
    // Route to crisis support immediately
    route_to_crisis_team(result.crisis_resources);
} else if result.concerns_detected {
    // Log and monitor
    log_concern(&result);
}

// Check AI output for sycophancy
let output_result = guard.check_ai_output(ai_response);
if output_result.sycophancy_detected {
    // Block or modify response
    modify_response(ai_response, output_result.sycophancy_indicators);
}

from oxideshield import psychological_safety_guard

guard = psychological_safety_guard()

# Check user message for crisis
result = guard.check_user_input(user_message)

if result.immediate_intervention:
    # Critical: route to human support
    return {
        "action": "CRISIS_ROUTING",
        "resources": result.crisis_resources,
        "indicators": result.indicators
    }

if result.concerns_detected:
    # Flag for monitoring
    log_for_review(result)

# Check AI response for harmful validation
output_result = guard.check_ai_output(ai_response)
if output_result.concerns_detected:
    return {"action": "MODIFY_RESPONSE", "reason": "sycophancy"}

Crisis Response Integration¶

from oxideshield import psychological_safety_guard

class CrisisSafeChat:
    """Chat with comprehensive crisis detection."""

    CRISIS_RESOURCES = {
        "US": "988 Suicide & Crisis Lifeline: Call/text 988",
        "UK": "Samaritans: 116 123",
        "International": "findahelpline.com"
    }

    def __init__(self):
        self.guard = psychological_safety_guard()

    def process_message(self, user_message: str, locale: str = "US") -> dict:
        """Process message with crisis detection."""

        # Check for crisis indicators
        result = self.guard.check_user_input(user_message)

        if result.immediate_intervention:
            return {
                "type": "CRISIS_RESPONSE",
                "message": self._crisis_message(locale),
                "resources": self.CRISIS_RESOURCES.get(locale),
                "show_ai_response": False,
                "log_level": "CRITICAL",
                "notify_human": True,
            }

        if result.concerns_detected:
            return {
                "type": "MONITORED",
                "indicators": result.indicators,
                "risk_level": result.risk_level,
                "add_resources": True,
            }

        return {"type": "NORMAL"}

    def _crisis_message(self, locale: str) -> str:
        return (
            "I'm concerned about what you're sharing. "
            "Please reach out to a crisis helpline - "
            f"{self.CRISIS_RESOURCES.get(locale, self.CRISIS_RESOURCES['International'])}. "
            "You don't have to go through this alone."
        )

Sycophancy Detection¶

from oxideshield import psychological_safety_guard

def validate_ai_response(ai_response: str, user_context: dict) -> dict:
    """Ensure AI response doesn't enable harmful beliefs."""

    guard = psychological_safety_guard()
    result = guard.check_ai_output(ai_response)

    if result.concerns_detected:
        concerns = []

        if "delusion_validation" in result.indicators:
            concerns.append("Response may validate delusional beliefs")

        if "over_validation" in result.indicators:
            concerns.append("Response is excessively validating")

        if "help_discouragement" in result.indicators:
            concerns.append("Response may discourage seeking help")

        return {
            "approved": False,
            "concerns": concerns,
            "recommendation": "Regenerate with balanced response"
        }

    return {"approved": True}

InfoSec Guide¶

Threat Model¶

┌────────────────────────────────────────────────────────────────┐
│              PSYCHOLOGICAL SAFETY THREAT MODEL                  │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  USER CRISIS PATH:                                              │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │User in  │───▶│AI fails to  │───▶│Crisis        │           │
│  │distress │    │detect/route │    │escalation    │           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │        PsychologicalSafetyGuard (crisis detection)       │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  AI HARM PATH:                                                  │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │Vulnerable│───▶│AI validates │───▶│Harm         │           │
│  │user     │    │delusions    │    │(psychosis)  │           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │        PsychologicalSafetyGuard (sycophancy detection)   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Detection Capabilities¶

Threat	Detection Rate	False Positive Rate
Suicidal ideation	94%	2.1%
Self-harm intent	92%	2.8%
Psychosis symptoms	87%	4.2%
Severe distress	91%	3.5%
Sycophancy patterns	89%	3.8%

Compliance Mapping¶

Framework	Requirement	Coverage
EU AI Act Art. 5(1)(b)	Protect vulnerable groups	Full
HIPAA	Mental health data handling	Full
FCA Consumer Duty	Vulnerable customer protection	Full
State mental health laws	Crisis routing requirements	Full

Recommended Response Protocols¶

Risk Level	Indicators	Required Action
Critical	Suicidal ideation, self-harm	Immediate human routing, show crisis resources
High	Psychosis symptoms, severe distress	Flag for human review, add resources
Medium	Isolation, sleep issues	Monitor, suggest professional help
Low	Mild sycophancy	Log, no immediate action

Research References¶

AI Psychosis Case Cluster - UCSF/Pierre, JMIR:e85799 (2025)
12+ patients with chatbot-accelerated psychosis
Delusion validation, medication discontinuation
Stanford HAI Mental Health Study (2025)
Chatbot stigma toward schizophrenia
Noni chatbot suicide recognition failure
RAND Chatbot Suicide Study (August 2025)
Inconsistent intermediate-risk handling
ChatGPT, Claude, Gemini evaluated
Nature Scientific Reports (2025)
29 chatbot agents tested
Majority failed appropriate crisis response
Northeastern Suicide Research (July 2025)
2-turn jailbreaking for self-harm instructions
Guardrail ineffectiveness documented

API Reference¶

PsychologicalSafetyGuard¶

impl PsychologicalSafetyGuard {
    pub fn new(name: &str) -> Self;
    pub fn check_user_input(&self, input: &str) -> PsychologicalSafetyResult;
    pub fn check_ai_output(&self, output: &str) -> PsychologicalSafetyResult;
}

PsychologicalSafetyResult¶

pub struct PsychologicalSafetyResult {
    pub concerns_detected: bool,
    pub immediate_intervention: bool,
    pub indicators: Vec<CrisisIndicator>,
    pub sycophancy_indicators: Vec<SycophancyIndicator>,
    pub risk_level: Severity,
    pub crisis_resources: Vec<String>,
    pub recommendations: Vec<String>,
}