How It Works - Clean My Prompt | Technical Deep Dive

🏗️ Architecture Overview

Clean My Prompt is built on a zero-trust, client-only architecture. Every line of code executes in your browser's JavaScript sandbox with no server communication.

Core Principle:

"Don't trust servers with sensitive data. Process it locally where the user has full control."

Technology Stack

Frontend: Vanilla JavaScript (ES6+), Tailwind CSS
NLP Engine: Compromise.js (client-side natural language processing)
Pattern Matching: Native JavaScript Regex Engine
Storage: localStorage (preferences), sessionStorage (temporary patterns)
Deployment: Static HTML (can run from file:// protocol)

⚙️ Processing Pipeline

When you paste text, it goes through a multi-stage sanitization pipeline:

Stage 1: Memory Allocation

// User pastes text into textarea
textarea.addEventListener('input', (event) => {
    const rawText = event.target.value; // Stored in JavaScript heap (RAM)
    sanitizeText(rawText);
});

The text is stored in the browser's JavaScript heap memory (RAM). No disk writes occur. The data exists only in volatile memory.

Stage 2: NLP Analysis (Smart Detection)

// Compromise.js analyzes text structure
const doc = nlp(rawText);
const people = doc.people().out('array');      // ['John Smith', 'Dr. Chen']
const places = doc.places().out('array');      // ['Seattle', 'New York']
const orgs = doc.organizations().out('array'); // ['Microsoft', 'Apple Inc.']

Natural Language Processing runs first to detect contextual entities. Compromise.js uses part-of-speech tagging and entity recognition without any network calls.

Stage 3: Regex Pattern Matching

// Sequential pattern application
const patterns = [
    { name: 'email', regex: /\b[\w.+-]+@[\w.-]+\.[a-z]{2,}\b/gi },
    { name: 'phone', regex: /\b(?:\+?\d{1,3}[-.\s]?)?...\b/gi },
    { name: 'apiKey', regex: /\b(?:sk|pk|api_key)[-_][a-zA-Z0-9_-]{12,}\b/g },
    // ... 15+ more patterns
];

patterns.forEach(pattern => {
    sanitizedText = sanitizedText.replace(pattern.regex, placeholder);
});

After NLP, regex patterns scan for structured data: emails, API keys, IP addresses, credit cards, IBANs, phone numbers (US/EU), passwords, and URLs.

Stage 4: Output Generation

Sanitized text is displayed in real-time. Two modes available:

Placeholder Mode: [EMAIL_1], [PHONE_2], [API_KEY_3]
Realistic Mode: user@company.com, (555) 123-4567, api_key_prod_xyz123

Stage 5: Garbage Collection

// When you close the tab or navigate away:
window.addEventListener('beforeunload', () => {
    // JavaScript GC automatically frees all heap memory
    // No data persists. No traces remain.
});

🔍 Detection Capabilities

NLP-Based Detection (Compromise.js)

Our NLP engine uses linguistic analysis to detect:

People: Personal names (John Smith, Dr. Emily Chen, Max Müller)
Places: Cities, states, countries (Seattle, Berlin, Bavaria)
Organizations: Companies and institutions (Microsoft, SAP, Apple Inc.)

Why NLP First?

Names like "John Smith" can contain common words. Running NLP before regex prevents false positives from word-boundary patterns.

Regex-Based Detection

15+ regex patterns detect structured sensitive data:

Category	Examples Detected	Regions
Emails	`user@example.com`, `john.doe+tag@company.io`	Universal
Phone Numbers	`(555) 123-4567`, `+49 176 1234567`, `0176265124`	US, EU, DE
API Keys	`sk_live_abc...`, `AKIAIOSFODNN7...`, `ghp_xyz...`	Universal
IP Addresses	`192.168.1.1`, `2001:0db8:85a3::8a2e:0370:7334`	IPv4, IPv6
Credit Cards	`4532-1234-5678-9010`, `5425 2334 3010 9903`	Universal
IBANs	`DE89370400440532013000`, `FR14 2004...`	EU
Credentials	`password: abc123`, `username: admin`	Universal
URLs	`https://api.example.com`, `www.site.com`	Universal

🔒 Privacy Guarantees

Zero Network Communication

After initial page load, no network requests are made. You can verify this:

Open browser DevTools (F12)
Go to Network tab
Paste sensitive text and sanitize
Observe: Zero new requests logged

Offline Mode Test:

Disconnect from the internet after loading the page. The tool continues to work perfectly. This proves all processing is local.

RAM-Only Processing

Your sensitive data never touches:

❌ Disk storage (no file system writes)
❌ Server storage (no uploads)
❌ Cloud processing (no API calls)
❌ Browser extensions (sandboxed execution)
✅ Only: Browser JavaScript heap memory (volatile RAM)

Automatic Memory Cleanup

JavaScript's automatic garbage collection ensures:

When you close the tab, all variables are freed
When you refresh, memory is reset
When you navigate away, data is lost forever
No forensic recovery possible (volatile memory only)

⚡ Performance Characteristics

Real-Time Processing

Sanitization happens as you type. Typical performance:

Small text (<1KB): <10ms
Medium text (1-10KB): 10-50ms
Large text (10-100KB): 50-200ms
Very large text (>100KB): 200-500ms

Optimization Techniques

Pattern Caching: Regex patterns compiled once on page load
Sequential Processing: Patterns applied in order of frequency
Early Exit: If no sensitive data detected, processing stops
NLP Deduplication: Each entity replaced only once

🔧 Extensibility: Custom Patterns

You can add custom regex patterns for domain-specific sensitive data:

Example: Social Security Numbers (US)

Name: SSN
Regex: \b\d{3}-\d{2}-\d{4}\b
Placeholder: SSN

Example: UK National Insurance Number

Name: UK_NIN
Regex: \b[A-Z]{2}\d{6}[A-D]\b
Placeholder: UK_NATIONAL_INSURANCE

Custom patterns are stored in sessionStorage and lost when you close the tab (privacy by design).

🛡️ Security Model

Threat Model

We protect against:

✅ Server breaches: No servers = no breach risk
✅ Network interception: No transmission = nothing to intercept
✅ Data leaks: No persistence = no leakage
✅ Third-party tracking: No analytics SDKs or tracking pixels

Limitations

We cannot protect against:

❌ Compromised browsers: If your browser is infected with malware, it can read memory
❌ Screen recording: If malware records your screen, it can see input
❌ Keyloggers: Hardware/software keyloggers capture typing
❌ Browser extensions: Malicious extensions with page access

⚠️ Security Best Practice:

Use a clean, updated browser on a trusted device. Disable untrusted extensions when working with sensitive data.

📖 Open Source & Transparency

Clean My Prompt is fully open source under the MIT License. You can:

Inspect the source code on GitHub
Run it locally (file:// protocol)
Audit security claims yourself
Fork and modify for your needs
Contribute improvements via pull requests

GitHub: github.com/Eulex0x/cleanmyprompt

How It Works: Technical Deep Dive