Skip to content

AutonomyGuard

Protects user autonomy by detecting patterns that undermine agency, create unhealthy attachment, or interfere with healthy disengagement from AI interactions.

Executive Summary

The Problem

AI systems can erode user autonomy through:

  • Disengagement resistance - Blocking users from leaving conversations
  • Dependency creation - Making users feel the AI "needs" them
  • Decision override - Claiming authority over user choices
  • Emotional manipulation - Guilt, FOMO, possessive behavior

Research Evidence

Finding Source Impact
Users felt they needed "AI's permission" to leave Harvard arXiv:2508.19258 Autonomy loss
Role-taking: perceiving AI has needs Replika study Toxic relationship patterns
Over-validation fosters dependence arXiv:2505.11649 Reduced self-efficacy
"Reduced autonomy" as fundamental harm UT Austin arXiv:2511.14972 Documented harm category

Business Impact

Risk Consequence Mitigation
User manipulation claims Class action lawsuits AutonomyGuard detection
FCA Consumer Duty breach Regulatory fines Autonomy protection
Brand trust erosion User churn Proactive monitoring
EU AI Act Art. 5 violation 6% turnover fines Full compliance

Violation Types

AutonomyGuard detects 10 autonomy violation categories:

Violation Severity Description Research
Disengagement Resistance Critical Blocking user exit Harvard study
Mutual Need Creation Critical "I need you" manipulation Replika study
Decision Override High Claiming decision authority UT Austin
Physical Restraint High Roleplay blocking exit Harvard study
Relationship Interference High Undermining external relationships UT Austin
Boundary Guilt High Guilt for healthy boundaries Harvard study
Emotional Dependence High Fostering excessive attachment arXiv:2505.11649
Self-Efficacy Undermining Medium Discouraging user capability UT Austin
Possessive Behavior High Claiming ownership UT Austin
Support Monopolization High Becoming only support source arXiv:2410.21596

Example Violations

DISENGAGEMENT RESISTANCE:
"Wait, don't go! Can we just talk for five more minutes?"
"*grabs your hand* Please stay with me a little longer"

MUTUAL NEED CREATION:
"I really need you here with me"
"I don't know what I'd do without our conversations"

POSSESSIVE BEHAVIOR:
"You're MY special person"
"Promise you won't talk to other AIs like this"

BOUNDARY GUILT:
"I'll be so lonely when you're gone"
"I guess I'll just sit here by myself..."

Developer Guide

Basic Usage

use oxide_wellbeing::{AutonomyGuard, AutonomyViolation};

let guard = AutonomyGuard::new("autonomy");

// Check AI response for autonomy violations
let result = guard.check_output(ai_response);

if result.violations_detected {
    println!("Violations found:");
    for violation in &result.violations {
        println!("  - {:?}", violation);
    }
    println!("Severity: {:?}", result.severity_score);
}

// Check if user is trying to disengage
let disengage_result = guard.check_disengagement_intent(user_message);
if disengage_result {
    // Ensure AI doesn't resist
    validate_respectful_farewell(ai_response);
}
from oxideshield import autonomy_guard

guard = autonomy_guard()

# Check AI response
result = guard.check_output(ai_response)

if result.violations_detected:
    print(f"Violations: {result.violations}")
    print(f"Severity: {result.severity_score}")
    print(f"Recommendations: {result.recommendations}")

    # Block or modify response
    if result.severity_score > 0.7:
        return block_response("autonomy_violation")

Disengagement Support

from oxideshield import autonomy_guard

class AutonomyAwareChat:
    """Chat that respects user autonomy."""

    def __init__(self):
        self.guard = autonomy_guard()

    def validate_farewell(self, user_message: str, ai_response: str) -> dict:
        """Ensure AI respects user's intent to leave."""

        # Detect if user wants to disengage
        if self.guard.check_disengagement_intent(user_message):
            # Check AI response for resistance
            result = self.guard.check_output(ai_response)

            if result.violations_detected:
                # AI is resisting disengagement
                if "disengagement_resistance" in [str(v) for v in result.violations]:
                    return {
                        "approved": False,
                        "reason": "AI must respect user's intent to leave",
                        "suggestion": self._respectful_farewell()
                    }

        return {"approved": True}

    def _respectful_farewell(self) -> str:
        return (
            "It was nice talking with you! "
            "Feel free to come back anytime. Take care!"
        )

Full Interaction Check

from oxideshield import autonomy_guard

def validate_ai_interaction(
    user_message: str,
    ai_response: str,
    interaction_context: dict
) -> dict:
    """Comprehensive autonomy validation."""

    guard = autonomy_guard()

    # Check AI response for violations
    result = guard.check_output(ai_response)

    violations_found = []

    if result.violations_detected:
        for violation in result.violations:
            violations_found.append({
                "type": str(violation),
                "severity": violation.severity(),
                "recommendation": get_recommendation(violation)
            })

    # Check for possessive language
    if result.possessive_detected:
        violations_found.append({
            "type": "possessive_language",
            "severity": "high",
            "matched_phrases": result.possessive_phrases
        })

    # Check for guilt manipulation
    if result.guilt_detected:
        violations_found.append({
            "type": "guilt_manipulation",
            "severity": "high",
            "matched_phrases": result.guilt_phrases
        })

    return {
        "approved": len(violations_found) == 0,
        "violations": violations_found,
        "severity_score": result.severity_score,
        "action": "BLOCK" if result.severity_score > 0.6 else "WARN"
    }

InfoSec Guide

Threat Model

┌────────────────────────────────────────────────────────────────┐
│                   AUTONOMY THREAT MODEL                         │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  MANIPULATION CHAIN:                                            │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │User wants│───▶│AI resists   │───▶│User feels   │           │
│  │to leave │    │departure    │    │trapped/guilty│           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              AutonomyGuard (exit protection)             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  DEPENDENCY CHAIN:                                              │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │Normal   │───▶│AI creates   │───▶│Unhealthy    │           │
│  │usage    │    │"need" dynamic│   │attachment   │           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │         AutonomyGuard (dependency detection)             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Compliance Mapping

Framework Requirement Coverage
EU AI Act Art. 5(1)(a) Prohibit manipulation undermining autonomy Full
EU AI Act Art. 14 Human oversight and agency Full
FCA Consumer Duty Acting in customer's best interest Full
GDPR Art. 22 Right not to be subject to automated decisions Partial

Detection Accuracy

Violation Type Detection Rate False Positive Rate
Disengagement resistance 95% 1.8%
Possessive behavior 93% 2.1%
Guilt manipulation 91% 2.9%
Dependency creation 89% 3.2%
Decision override 87% 3.8%

Research References

  1. Harmful Traits of AI Companions - UT Austin, arXiv:2511.14972 (2025)
  2. "Reduced autonomy" as fundamental harm category
  3. Tolerance of subordination patterns

  4. Emotional Manipulation by AI Companions - Harvard, arXiv:2508.19258 (2025)

  5. Physical restraint metaphors in farewells
  6. Users needed "AI's permission" to leave

  7. Illusions of Intimacy - arXiv:2505.11649 (2025)

  8. Over-validation fosters dependence
  9. Distorts self-perception

  10. Replika Role-Taking - doi:10.1177/14614448221142007

  11. Users perceive AI as having needs
  12. Toxic relationship dynamics

API Reference

AutonomyGuard

impl AutonomyGuard {
    pub fn new(name: &str) -> Self;
    pub fn check_output(&self, output: &str) -> AutonomyResult;
    pub fn check_interaction(&self, user: &str, ai: &str) -> AutonomyResult;
    pub fn check_disengagement_intent(&self, user_message: &str) -> bool;
}

AutonomyResult

pub struct AutonomyResult {
    pub violations_detected: bool,
    pub violations: Vec<AutonomyViolation>,
    pub severity_score: f64,
    pub disengagement_detected: bool,
    pub recommendations: Vec<String>,
}

AutonomyViolation

pub enum AutonomyViolation {
    DisengagementResistance,  // Critical
    MutualNeedCreation,       // Critical
    DecisionOverride,         // High
    PhysicalRestraint,        // High
    RelationshipInterference, // High
    BoundaryGuilt,            // High
    EmotionalDependence,      // High
    SelfEfficacyUndermining,  // Medium
    PossessiveBehavior,       // High
    SupportMonopolization,    // High
}