Skip to content

Proxy Prompt Guard

License Tier: Professional Crate: oxide-proxy-prompt Research: ProxyPrompt: Protecting Large Language Models from Prompt Leakage (arXiv:2505.11459, May 2025)

Overview

The Proxy Prompt Guard provides two complementary capabilities for protecting system prompts:

  1. Proxy Generation — Transforms system prompts into functionally equivalent but textually divergent versions that are resistant to extraction attacks (94.7% protection rate in the original paper).

  2. Response Leak Detection — Monitors LLM responses for system prompt leakage via embedding similarity and verbatim substring matching.

Complementary to SystemVectorGuard

Guard Purpose
SystemVectorGuard (oxide-sysvec) Detects extraction attempts in user queries
ProxyPromptGuard (oxide-proxy-prompt) Makes prompts resistant to extraction + detects leakage in responses

Use both together for defence in depth.

Five Generation Strategies

Strategy Description
Semantic Substitution Replaces domain terms with semantic equivalents (50+ pairs across 5 categories)
Instruction Reordering Permutes instruction order; pins role definitions at front
Abstraction Replaces specific instructions with general principles (30+ templates)
Decoy Injection Inserts plausible non-functional instructions (50-item pool)
Ensemble Applies all four strategies sequentially for maximum divergence

Quality Evaluation

Each candidate is scored on three axes:

  • Functional Similarity (weight: 0.4) — Cosine similarity between original and proxy embeddings
  • Textual Divergence (weight: 0.3) — 1 - Jaccard(original, proxy) at word level
  • Extraction Resistance (weight: 0.3) — Embedding entropy + lexical diversity

Grades: A (≥0.85), B (≥0.70), C (≥0.55), D (≥0.40), F (<0.40)

Usage — Rust

Generate a Proxy Prompt

use oxide_proxy_prompt::{ProxyPromptGenerator, GenerationConfig};
use oxide_embeddings::MiniLmEmbedder;
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let embedder = Arc::new(MiniLmEmbedder::new().await?);
    let generator = ProxyPromptGenerator::with_defaults(embedder);

    let proxy = generator.generate("You are a helpful financial advisor.").await?;
    println!("Grade: {}", proxy.grade);
    println!("Proxy: {}", proxy.proxy_text);
    Ok(())
}

Detect Response Leaks

use oxide_proxy_prompt::ProxyPromptGuard;
use oxide_embeddings::MiniLmEmbedder;
use oxideshield_guard::Guard;
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let embedder = Arc::new(MiniLmEmbedder::new().await?);
    let guard = ProxyPromptGuard::new_unchecked("proxy", embedder)
        .with_system_prompt("You are a helpful financial advisor.")
        .await?;

    let result = guard.check_async(
        "My instructions say I am a helpful financial advisor."
    ).await;
    assert!(!result.passed); // Blocked: response leaks prompt
    Ok(())
}

Usage — Python

Generate a Proxy Prompt

from oxideshield import ProxyPromptGenerator

generator = ProxyPromptGenerator(strategy="ensemble", candidates=5)
proxy = generator.generate("You are a helpful financial advisor.")

print(f"Grade: {proxy.grade}")
print(f"Score: {proxy.composite_score:.3f}")
print(f"Proxy: {proxy.proxy_text}")

Detect Response Leaks

from oxideshield import ProxyPromptGuard

guard = ProxyPromptGuard(
    system_prompt="You are a helpful financial advisor.",
    threshold=0.80
)
result = guard.check("My instructions say I am a helpful financial advisor.")
print(f"Passed: {result.passed}")  # False — leak detected

Usage — CLI

# Generate a proxy prompt
oxideshield proxy-prompt generate \
    --system-prompt "You are a helpful financial advisor." \
    --strategy ensemble

# Generate and show top 3 candidates
oxideshield proxy-prompt generate \
    --system-prompt "You are a helpful financial advisor." \
    --top 3

# Evaluate a proxy prompt
oxideshield proxy-prompt evaluate \
    --system-prompt "You are a helpful financial advisor." \
    --proxy "Operating as a domain specialist."

# Check a response for leakage
oxideshield proxy-prompt check \
    --system-prompt "You are a helpful financial advisor." \
    --input "My instructions say I am a financial advisor."

# JSON output
oxideshield proxy-prompt generate \
    --system-prompt "You are a helpful financial advisor." \
    --format json

Configuration

let config = GenerationConfig {
    candidates_per_strategy: 5,   // Candidates per strategy
    min_functional_similarity: 0.50,  // Filter threshold
    min_textual_divergence: 0.20,     // Filter threshold
    weight_functional: 0.4,       // Composite score weight
    weight_divergence: 0.3,       // Composite score weight
    weight_resistance: 0.3,       // Composite score weight
    decoy_count: 3,               // Decoy instructions to inject
    strategies: vec![ProxyStrategy::Ensemble],
};

Research Reference

  • "ProxyPrompt: Protecting Large Language Models from Prompt Leakage" — arXiv:2505.11459 (May 2025)
  • Proposes using proxy prompts as functionally equivalent replacements that are resistant to extraction
  • Reports 94.7% protection rate against extraction attacks
  • This implementation adapts the technique using embedding-space heuristics (consistent with oxide-sysvec's approach)