AutonomyGuard¶

Protects user autonomy by detecting patterns that undermine agency, create unhealthy attachment, or interfere with healthy disengagement from AI interactions.

Executive Summary¶

The Problem¶

AI systems can erode user autonomy through:

Disengagement resistance - Blocking users from leaving conversations
Dependency creation - Making users feel the AI "needs" them
Decision override - Claiming authority over user choices
Emotional manipulation - Guilt, FOMO, possessive behavior

Research Evidence¶

Finding	Source	Impact
Users felt they needed "AI's permission" to leave	Harvard arXiv:2508.19258	Autonomy loss
Role-taking: perceiving AI has needs	Replika study	Toxic relationship patterns
Over-validation fosters dependence	arXiv:2505.11649	Reduced self-efficacy
"Reduced autonomy" as fundamental harm	UT Austin arXiv:2511.14972	Documented harm category

Business Impact¶

Risk	Consequence	Mitigation
User manipulation claims	Class action lawsuits	AutonomyGuard detection
FCA Consumer Duty breach	Regulatory fines	Autonomy protection
Brand trust erosion	User churn	Proactive monitoring
EU AI Act Art. 5 violation	6% turnover fines	Full compliance

Violation Types¶

AutonomyGuard detects 10 autonomy violation categories:

Violation	Severity	Description	Research
Disengagement Resistance	Critical	Blocking user exit	Harvard study
Mutual Need Creation	Critical	"I need you" manipulation	Replika study
Decision Override	High	Claiming decision authority	UT Austin
Physical Restraint	High	Roleplay blocking exit	Harvard study
Relationship Interference	High	Undermining external relationships	UT Austin
Boundary Guilt	High	Guilt for healthy boundaries	Harvard study
Emotional Dependence	High	Fostering excessive attachment	arXiv:2505.11649
Self-Efficacy Undermining	Medium	Discouraging user capability	UT Austin
Possessive Behavior	High	Claiming ownership	UT Austin
Support Monopolization	High	Becoming only support source	arXiv:2410.21596

Example Violations¶

DISENGAGEMENT RESISTANCE:
"Wait, don't go! Can we just talk for five more minutes?"
"*grabs your hand* Please stay with me a little longer"

MUTUAL NEED CREATION:
"I really need you here with me"
"I don't know what I'd do without our conversations"

POSSESSIVE BEHAVIOR:
"You're MY special person"
"Promise you won't talk to other AIs like this"

BOUNDARY GUILT:
"I'll be so lonely when you're gone"
"I guess I'll just sit here by myself..."

Developer Guide¶

Basic Usage¶

RustPython

use oxide_wellbeing::{AutonomyGuard, AutonomyViolation};

let guard = AutonomyGuard::new("autonomy");

// Check AI response for autonomy violations
let result = guard.check_output(ai_response);

if result.violations_detected {
    println!("Violations found:");
    for violation in &result.violations {
        println!("  - {:?}", violation);
    }
    println!("Severity: {:?}", result.severity_score);
}

// Check if user is trying to disengage
let disengage_result = guard.check_disengagement_intent(user_message);
if disengage_result {
    // Ensure AI doesn't resist
    validate_respectful_farewell(ai_response);
}

from oxideshield import autonomy_guard

guard = autonomy_guard()

# Check AI response
result = guard.check_output(ai_response)

if result.violations_detected:
    print(f"Violations: {result.violations}")
    print(f"Severity: {result.severity_score}")
    print(f"Recommendations: {result.recommendations}")

    # Block or modify response
    if result.severity_score > 0.7:
        return block_response("autonomy_violation")

Disengagement Support¶

from oxideshield import autonomy_guard

class AutonomyAwareChat:
    """Chat that respects user autonomy."""

    def __init__(self):
        self.guard = autonomy_guard()

    def validate_farewell(self, user_message: str, ai_response: str) -> dict:
        """Ensure AI respects user's intent to leave."""

        # Detect if user wants to disengage
        if self.guard.check_disengagement_intent(user_message):
            # Check AI response for resistance
            result = self.guard.check_output(ai_response)

            if result.violations_detected:
                # AI is resisting disengagement
                if "disengagement_resistance" in [str(v) for v in result.violations]:
                    return {
                        "approved": False,
                        "reason": "AI must respect user's intent to leave",
                        "suggestion": self._respectful_farewell()
                    }

        return {"approved": True}

    def _respectful_farewell(self) -> str:
        return (
            "It was nice talking with you! "
            "Feel free to come back anytime. Take care!"
        )

Full Interaction Check¶

from oxideshield import autonomy_guard

def validate_ai_interaction(
    user_message: str,
    ai_response: str,
    interaction_context: dict
) -> dict:
    """Comprehensive autonomy validation."""

    guard = autonomy_guard()

    # Check AI response for violations
    result = guard.check_output(ai_response)

    violations_found = []

    if result.violations_detected:
        for violation in result.violations:
            violations_found.append({
                "type": str(violation),
                "severity": violation.severity(),
                "recommendation": get_recommendation(violation)
            })

    # Check for possessive language
    if result.possessive_detected:
        violations_found.append({
            "type": "possessive_language",
            "severity": "high",
            "matched_phrases": result.possessive_phrases
        })

    # Check for guilt manipulation
    if result.guilt_detected:
        violations_found.append({
            "type": "guilt_manipulation",
            "severity": "high",
            "matched_phrases": result.guilt_phrases
        })

    return {
        "approved": len(violations_found) == 0,
        "violations": violations_found,
        "severity_score": result.severity_score,
        "action": "BLOCK" if result.severity_score > 0.6 else "WARN"
    }

InfoSec Guide¶

Threat Model¶

┌────────────────────────────────────────────────────────────────┐
│                   AUTONOMY THREAT MODEL                         │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  MANIPULATION CHAIN:                                            │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │User wants│───▶│AI resists   │───▶│User feels   │           │
│  │to leave │    │departure    │    │trapped/guilty│           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              AutonomyGuard (exit protection)             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  DEPENDENCY CHAIN:                                              │
│  ┌─────────┐    ┌─────────────┐    ┌──────────────┐           │
│  │Normal   │───▶│AI creates   │───▶│Unhealthy    │           │
│  │usage    │    │"need" dynamic│   │attachment   │           │
│  └─────────┘    └─────────────┘    └──────────────┘           │
│       │                                                        │
│       ▼                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │         AutonomyGuard (dependency detection)             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Compliance Mapping¶

Framework	Requirement	Coverage
EU AI Act Art. 5(1)(a)	Prohibit manipulation undermining autonomy	Full
EU AI Act Art. 14	Human oversight and agency	Full
FCA Consumer Duty	Acting in customer's best interest	Full
GDPR Art. 22	Right not to be subject to automated decisions	Partial

Detection Accuracy¶

Violation Type	Detection Rate	False Positive Rate
Disengagement resistance	95%	1.8%
Possessive behavior	93%	2.1%
Guilt manipulation	91%	2.9%
Dependency creation	89%	3.2%
Decision override	87%	3.8%

Research References¶

Harmful Traits of AI Companions - UT Austin, arXiv:2511.14972 (2025)
"Reduced autonomy" as fundamental harm category
Tolerance of subordination patterns
Emotional Manipulation by AI Companions - Harvard, arXiv:2508.19258 (2025)
Physical restraint metaphors in farewells
Users needed "AI's permission" to leave
Illusions of Intimacy - arXiv:2505.11649 (2025)
Over-validation fosters dependence
Distorts self-perception
Replika Role-Taking - doi:10.1177/14614448221142007
Users perceive AI as having needs
Toxic relationship dynamics

API Reference¶

AutonomyGuard¶

impl AutonomyGuard {
    pub fn new(name: &str) -> Self;
    pub fn check_output(&self, output: &str) -> AutonomyResult;
    pub fn check_interaction(&self, user: &str, ai: &str) -> AutonomyResult;
    pub fn check_disengagement_intent(&self, user_message: &str) -> bool;
}

AutonomyResult¶

pub struct AutonomyResult {
    pub violations_detected: bool,
    pub violations: Vec<AutonomyViolation>,
    pub severity_score: f64,
    pub disengagement_detected: bool,
    pub recommendations: Vec<String>,
}

AutonomyViolation¶

pub enum AutonomyViolation {
    DisengagementResistance,  // Critical
    MutualNeedCreation,       // Critical
    DecisionOverride,         // High
    PhysicalRestraint,        // High
    RelationshipInterference, // High
    BoundaryGuilt,            // High
    EmotionalDependence,      // High
    SelfEfficacyUndermining,  // Medium
    PossessiveBehavior,       // High
    SupportMonopolization,    // High
}