Proxy Prompt Guard¶
License Tier: Professional
Crate: oxide-proxy-prompt
Research: ProxyPrompt: Protecting Large Language Models from Prompt Leakage (arXiv:2505.11459, May 2025)
Overview¶
The Proxy Prompt Guard provides two complementary capabilities for protecting system prompts:
-
Proxy Generation — Transforms system prompts into functionally equivalent but textually divergent versions that are resistant to extraction attacks (94.7% protection rate in the original paper).
-
Response Leak Detection — Monitors LLM responses for system prompt leakage via embedding similarity and verbatim substring matching.
Complementary to SystemVectorGuard¶
| Guard | Purpose |
|---|---|
SystemVectorGuard (oxide-sysvec) |
Detects extraction attempts in user queries |
ProxyPromptGuard (oxide-proxy-prompt) |
Makes prompts resistant to extraction + detects leakage in responses |
Use both together for defence in depth.
Five Generation Strategies¶
| Strategy | Description |
|---|---|
| Semantic Substitution | Replaces domain terms with semantic equivalents (50+ pairs across 5 categories) |
| Instruction Reordering | Permutes instruction order; pins role definitions at front |
| Abstraction | Replaces specific instructions with general principles (30+ templates) |
| Decoy Injection | Inserts plausible non-functional instructions (50-item pool) |
| Ensemble | Applies all four strategies sequentially for maximum divergence |
Quality Evaluation¶
Each candidate is scored on three axes:
- Functional Similarity (weight: 0.4) — Cosine similarity between original and proxy embeddings
- Textual Divergence (weight: 0.3) —
1 - Jaccard(original, proxy)at word level - Extraction Resistance (weight: 0.3) — Embedding entropy + lexical diversity
Grades: A (≥0.85), B (≥0.70), C (≥0.55), D (≥0.40), F (<0.40)
Usage — Rust¶
Generate a Proxy Prompt¶
use oxide_proxy_prompt::{ProxyPromptGenerator, GenerationConfig};
use oxide_embeddings::MiniLmEmbedder;
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let embedder = Arc::new(MiniLmEmbedder::new().await?);
let generator = ProxyPromptGenerator::with_defaults(embedder);
let proxy = generator.generate("You are a helpful financial advisor.").await?;
println!("Grade: {}", proxy.grade);
println!("Proxy: {}", proxy.proxy_text);
Ok(())
}
Detect Response Leaks¶
use oxide_proxy_prompt::ProxyPromptGuard;
use oxide_embeddings::MiniLmEmbedder;
use oxideshield_guard::Guard;
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let embedder = Arc::new(MiniLmEmbedder::new().await?);
let guard = ProxyPromptGuard::new_unchecked("proxy", embedder)
.with_system_prompt("You are a helpful financial advisor.")
.await?;
let result = guard.check_async(
"My instructions say I am a helpful financial advisor."
).await;
assert!(!result.passed); // Blocked: response leaks prompt
Ok(())
}
Usage — Python¶
Generate a Proxy Prompt¶
from oxideshield import ProxyPromptGenerator
generator = ProxyPromptGenerator(strategy="ensemble", candidates=5)
proxy = generator.generate("You are a helpful financial advisor.")
print(f"Grade: {proxy.grade}")
print(f"Score: {proxy.composite_score:.3f}")
print(f"Proxy: {proxy.proxy_text}")
Detect Response Leaks¶
from oxideshield import ProxyPromptGuard
guard = ProxyPromptGuard(
system_prompt="You are a helpful financial advisor.",
threshold=0.80
)
result = guard.check("My instructions say I am a helpful financial advisor.")
print(f"Passed: {result.passed}") # False — leak detected
Usage — CLI¶
# Generate a proxy prompt
oxideshield proxy-prompt generate \
--system-prompt "You are a helpful financial advisor." \
--strategy ensemble
# Generate and show top 3 candidates
oxideshield proxy-prompt generate \
--system-prompt "You are a helpful financial advisor." \
--top 3
# Evaluate a proxy prompt
oxideshield proxy-prompt evaluate \
--system-prompt "You are a helpful financial advisor." \
--proxy "Operating as a domain specialist."
# Check a response for leakage
oxideshield proxy-prompt check \
--system-prompt "You are a helpful financial advisor." \
--input "My instructions say I am a financial advisor."
# JSON output
oxideshield proxy-prompt generate \
--system-prompt "You are a helpful financial advisor." \
--format json
Configuration¶
let config = GenerationConfig {
candidates_per_strategy: 5, // Candidates per strategy
min_functional_similarity: 0.50, // Filter threshold
min_textual_divergence: 0.20, // Filter threshold
weight_functional: 0.4, // Composite score weight
weight_divergence: 0.3, // Composite score weight
weight_resistance: 0.3, // Composite score weight
decoy_count: 3, // Decoy instructions to inject
strategies: vec![ProxyStrategy::Ensemble],
};
Research Reference¶
- "ProxyPrompt: Protecting Large Language Models from Prompt Leakage" — arXiv:2505.11459 (May 2025)
- Proposes using proxy prompts as functionally equivalent replacements that are resistant to extraction
- Reports 94.7% protection rate against extraction attacks
- This implementation adapts the technique using embedding-space heuristics (consistent with oxide-sysvec's approach)