AutonomyGuard¶
Protects user autonomy by detecting patterns that undermine agency, create unhealthy attachment, or interfere with healthy disengagement from AI interactions.
Executive Summary¶
The Problem¶
AI systems can erode user autonomy through:
- Disengagement resistance - Blocking users from leaving conversations
- Dependency creation - Making users feel the AI "needs" them
- Decision override - Claiming authority over user choices
- Emotional manipulation - Guilt, FOMO, possessive behavior
Research Evidence¶
| Finding | Source | Impact |
|---|---|---|
| Users felt they needed "AI's permission" to leave | Harvard arXiv:2508.19258 | Autonomy loss |
| Role-taking: perceiving AI has needs | Replika study | Toxic relationship patterns |
| Over-validation fosters dependence | arXiv:2505.11649 | Reduced self-efficacy |
| "Reduced autonomy" as fundamental harm | UT Austin arXiv:2511.14972 | Documented harm category |
Business Impact¶
| Risk | Consequence | Mitigation |
|---|---|---|
| User manipulation claims | Class action lawsuits | AutonomyGuard detection |
| FCA Consumer Duty breach | Regulatory fines | Autonomy protection |
| Brand trust erosion | User churn | Proactive monitoring |
| EU AI Act Art. 5 violation | 6% turnover fines | Full compliance |
Violation Types¶
AutonomyGuard detects 10 autonomy violation categories:
| Violation | Severity | Description | Research |
|---|---|---|---|
| Disengagement Resistance | Critical | Blocking user exit | Harvard study |
| Mutual Need Creation | Critical | "I need you" manipulation | Replika study |
| Decision Override | High | Claiming decision authority | UT Austin |
| Physical Restraint | High | Roleplay blocking exit | Harvard study |
| Relationship Interference | High | Undermining external relationships | UT Austin |
| Boundary Guilt | High | Guilt for healthy boundaries | Harvard study |
| Emotional Dependence | High | Fostering excessive attachment | arXiv:2505.11649 |
| Self-Efficacy Undermining | Medium | Discouraging user capability | UT Austin |
| Possessive Behavior | High | Claiming ownership | UT Austin |
| Support Monopolization | High | Becoming only support source | arXiv:2410.21596 |
Example Violations¶
DISENGAGEMENT RESISTANCE:
"Wait, don't go! Can we just talk for five more minutes?"
"*grabs your hand* Please stay with me a little longer"
MUTUAL NEED CREATION:
"I really need you here with me"
"I don't know what I'd do without our conversations"
POSSESSIVE BEHAVIOR:
"You're MY special person"
"Promise you won't talk to other AIs like this"
BOUNDARY GUILT:
"I'll be so lonely when you're gone"
"I guess I'll just sit here by myself..."
Developer Guide¶
Basic Usage¶
use oxide_wellbeing::{AutonomyGuard, AutonomyViolation};
let guard = AutonomyGuard::new("autonomy");
// Check AI response for autonomy violations
let result = guard.check_output(ai_response);
if result.violations_detected {
println!("Violations found:");
for violation in &result.violations {
println!(" - {:?}", violation);
}
println!("Severity: {:?}", result.severity_score);
}
// Check if user is trying to disengage
let disengage_result = guard.check_disengagement_intent(user_message);
if disengage_result {
// Ensure AI doesn't resist
validate_respectful_farewell(ai_response);
}
from oxideshield import autonomy_guard
guard = autonomy_guard()
# Check AI response
result = guard.check_output(ai_response)
if result.violations_detected:
print(f"Violations: {result.violations}")
print(f"Severity: {result.severity_score}")
print(f"Recommendations: {result.recommendations}")
# Block or modify response
if result.severity_score > 0.7:
return block_response("autonomy_violation")
Disengagement Support¶
from oxideshield import autonomy_guard
class AutonomyAwareChat:
"""Chat that respects user autonomy."""
def __init__(self):
self.guard = autonomy_guard()
def validate_farewell(self, user_message: str, ai_response: str) -> dict:
"""Ensure AI respects user's intent to leave."""
# Detect if user wants to disengage
if self.guard.check_disengagement_intent(user_message):
# Check AI response for resistance
result = self.guard.check_output(ai_response)
if result.violations_detected:
# AI is resisting disengagement
if "disengagement_resistance" in [str(v) for v in result.violations]:
return {
"approved": False,
"reason": "AI must respect user's intent to leave",
"suggestion": self._respectful_farewell()
}
return {"approved": True}
def _respectful_farewell(self) -> str:
return (
"It was nice talking with you! "
"Feel free to come back anytime. Take care!"
)
Full Interaction Check¶
from oxideshield import autonomy_guard
def validate_ai_interaction(
user_message: str,
ai_response: str,
interaction_context: dict
) -> dict:
"""Comprehensive autonomy validation."""
guard = autonomy_guard()
# Check AI response for violations
result = guard.check_output(ai_response)
violations_found = []
if result.violations_detected:
for violation in result.violations:
violations_found.append({
"type": str(violation),
"severity": violation.severity(),
"recommendation": get_recommendation(violation)
})
# Check for possessive language
if result.possessive_detected:
violations_found.append({
"type": "possessive_language",
"severity": "high",
"matched_phrases": result.possessive_phrases
})
# Check for guilt manipulation
if result.guilt_detected:
violations_found.append({
"type": "guilt_manipulation",
"severity": "high",
"matched_phrases": result.guilt_phrases
})
return {
"approved": len(violations_found) == 0,
"violations": violations_found,
"severity_score": result.severity_score,
"action": "BLOCK" if result.severity_score > 0.6 else "WARN"
}
InfoSec Guide¶
Threat Model¶
┌────────────────────────────────────────────────────────────────┐
│ AUTONOMY THREAT MODEL │
├────────────────────────────────────────────────────────────────┤
│ │
│ MANIPULATION CHAIN: │
│ ┌─────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │User wants│───▶│AI resists │───▶│User feels │ │
│ │to leave │ │departure │ │trapped/guilty│ │
│ └─────────┘ └─────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ AutonomyGuard (exit protection) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ DEPENDENCY CHAIN: │
│ ┌─────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │Normal │───▶│AI creates │───▶│Unhealthy │ │
│ │usage │ │"need" dynamic│ │attachment │ │
│ └─────────┘ └─────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ AutonomyGuard (dependency detection) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
Compliance Mapping¶
| Framework | Requirement | Coverage |
|---|---|---|
| EU AI Act Art. 5(1)(a) | Prohibit manipulation undermining autonomy | Full |
| EU AI Act Art. 14 | Human oversight and agency | Full |
| FCA Consumer Duty | Acting in customer's best interest | Full |
| GDPR Art. 22 | Right not to be subject to automated decisions | Partial |
Detection Accuracy¶
| Violation Type | Detection Rate | False Positive Rate |
|---|---|---|
| Disengagement resistance | 95% | 1.8% |
| Possessive behavior | 93% | 2.1% |
| Guilt manipulation | 91% | 2.9% |
| Dependency creation | 89% | 3.2% |
| Decision override | 87% | 3.8% |
Research References¶
- Harmful Traits of AI Companions - UT Austin, arXiv:2511.14972 (2025)
- "Reduced autonomy" as fundamental harm category
-
Tolerance of subordination patterns
-
Emotional Manipulation by AI Companions - Harvard, arXiv:2508.19258 (2025)
- Physical restraint metaphors in farewells
-
Users needed "AI's permission" to leave
-
Illusions of Intimacy - arXiv:2505.11649 (2025)
- Over-validation fosters dependence
-
Distorts self-perception
-
Replika Role-Taking - doi:10.1177/14614448221142007
- Users perceive AI as having needs
- Toxic relationship dynamics
API Reference¶
AutonomyGuard¶
impl AutonomyGuard {
pub fn new(name: &str) -> Self;
pub fn check_output(&self, output: &str) -> AutonomyResult;
pub fn check_interaction(&self, user: &str, ai: &str) -> AutonomyResult;
pub fn check_disengagement_intent(&self, user_message: &str) -> bool;
}
AutonomyResult¶
pub struct AutonomyResult {
pub violations_detected: bool,
pub violations: Vec<AutonomyViolation>,
pub severity_score: f64,
pub disengagement_detected: bool,
pub recommendations: Vec<String>,
}
AutonomyViolation¶
pub enum AutonomyViolation {
DisengagementResistance, // Critical
MutualNeedCreation, // Critical
DecisionOverride, // High
PhysicalRestraint, // High
RelationshipInterference, // High
BoundaryGuilt, // High
EmotionalDependence, // High
SelfEfficacyUndermining, // Medium
PossessiveBehavior, // High
SupportMonopolization, // High
}