Loading...

How It Works: Technical Deep Dive

A comprehensive look at our client-side sanitization architecture, privacy guarantees, and detection algorithms.

πŸ—οΈ Architecture Overview

Clean My Prompt is built on a zero-trust, client-only architecture. Every line of code executes in your browser's JavaScript sandbox with no server communication.

Core Principle:

"Don't trust servers with sensitive data. Process it locally where the user has full control."

Technology Stack

βš™οΈ Processing Pipeline

When you paste text, it goes through a multi-stage sanitization pipeline:

Stage 1: Memory Allocation

// User pastes text into textarea
textarea.addEventListener('input', (event) => {
    const rawText = event.target.value; // Stored in JavaScript heap (RAM)
    sanitizeText(rawText);
});

The text is stored in the browser's JavaScript heap memory (RAM). No disk writes occur. The data exists only in volatile memory.

Stage 2: NLP Analysis (Smart Detection)

// Compromise.js analyzes text structure
const doc = nlp(rawText);
const people = doc.people().out('array');      // ['John Smith', 'Dr. Chen']
const places = doc.places().out('array');      // ['Seattle', 'New York']
const orgs = doc.organizations().out('array'); // ['Microsoft', 'Apple Inc.']

Natural Language Processing runs first to detect contextual entities. Compromise.js uses part-of-speech tagging and entity recognition without any network calls.

Stage 3: Regex Pattern Matching

// Sequential pattern application
const patterns = [
    { name: 'email', regex: /\b[\w.+-]+@[\w.-]+\.[a-z]{2,}\b/gi },
    { name: 'phone', regex: /\b(?:\+?\d{1,3}[-.\s]?)?...\b/gi },
    { name: 'apiKey', regex: /\b(?:sk|pk|api_key)[-_][a-zA-Z0-9_-]{12,}\b/g },
    // ... 15+ more patterns
];

patterns.forEach(pattern => {
    sanitizedText = sanitizedText.replace(pattern.regex, placeholder);
});

After NLP, regex patterns scan for structured data: emails, API keys, IP addresses, credit cards, IBANs, phone numbers (US/EU), passwords, and URLs.

Stage 4: Output Generation

Sanitized text is displayed in real-time. Two modes available:

Stage 5: Garbage Collection

// When you close the tab or navigate away:
window.addEventListener('beforeunload', () => {
    // JavaScript GC automatically frees all heap memory
    // No data persists. No traces remain.
});

πŸ” Detection Capabilities

NLP-Based Detection (Compromise.js)

Our NLP engine uses linguistic analysis to detect:

Why NLP First?

Names like "John Smith" can contain common words. Running NLP before regex prevents false positives from word-boundary patterns.

Regex-Based Detection

15+ regex patterns detect structured sensitive data:

Category Examples Detected Regions
Emails user@example.com, john.doe+tag@company.io Universal
Phone Numbers (555) 123-4567, +49 176 1234567, 0176265124 US, EU, DE
API Keys sk_live_abc..., AKIAIOSFODNN7..., ghp_xyz... Universal
IP Addresses 192.168.1.1, 2001:0db8:85a3::8a2e:0370:7334 IPv4, IPv6
Credit Cards 4532-1234-5678-9010, 5425 2334 3010 9903 Universal
IBANs DE89370400440532013000, FR14 2004... EU
Credentials password: abc123, username: admin Universal
URLs https://api.example.com, www.site.com Universal

πŸ”’ Privacy Guarantees

Zero Network Communication

After initial page load, no network requests are made. You can verify this:

  1. Open browser DevTools (F12)
  2. Go to Network tab
  3. Paste sensitive text and sanitize
  4. Observe: Zero new requests logged

Offline Mode Test:

Disconnect from the internet after loading the page. The tool continues to work perfectly. This proves all processing is local.

RAM-Only Processing

Your sensitive data never touches:

Automatic Memory Cleanup

JavaScript's automatic garbage collection ensures:

⚑ Performance Characteristics

Real-Time Processing

Sanitization happens as you type. Typical performance:

Optimization Techniques

πŸ”§ Extensibility: Custom Patterns

You can add custom regex patterns for domain-specific sensitive data:

Example: Social Security Numbers (US)

Name: SSN
Regex: \b\d{3}-\d{2}-\d{4}\b
Placeholder: SSN

Example: UK National Insurance Number

Name: UK_NIN
Regex: \b[A-Z]{2}\d{6}[A-D]\b
Placeholder: UK_NATIONAL_INSURANCE

Custom patterns are stored in sessionStorage and lost when you close the tab (privacy by design).

πŸ›‘οΈ Security Model

Threat Model

We protect against:

Limitations

We cannot protect against:

⚠️ Security Best Practice:

Use a clean, updated browser on a trusted device. Disable untrusted extensions when working with sensitive data.

πŸ“– Open Source & Transparency

Clean My Prompt is fully open source under the MIT License. You can:

GitHub: github.com/Eulex0x/cleanmyprompt