Detect disguised profanity with character substitution, repeated letters, and symbol replacement

Advanced detection system for disguised profanity using character substitution, symbol replacement, and repeated letters. Catches common obfuscation patterns like sh1t, f*ck, a$$hole, and daaaamn.

Obfuscation detection automatically disables word boundaries to catch partial matches within words and handles complex character substitution patterns.

Character Substitution Mapping

The obfuscation system uses a sophisticated character replacement algorithm that normalizes disguised text before profanity detection.

Substitution Rules

Prop	Type	Default
`1?`	`Character replacement`	`sh1t → shit`
`@?`	`Character replacement`	`d@mn → damn`
`$?`	`Character replacement`	`a$$hole → asshole`
`!?`	`Character replacement`	`sh!t → shit`
`*?`	`Character removal`	`f*ck → fck (then fuzzy matched)`

Repeated Character Normalization

The system also handles excessive character repetition by normalizing repeated letters:

Repeated Character Examples

// Repeated character patterns
"daaaamn" → "damn"     // Multiple 'a' reduced to double
"shiiiit" → "shiit"    // Multiple 'i' reduced to double  
"fuuuuck" → "fuuck"    // Multiple 'u' reduced to double
"helllll" → "helll"    // Multiple 'l' reduced to double

// Algorithm: ([a-zA-Z])\\1{1,} → $1$1
// Replaces 3+ repeated characters with exactly 2 characters

Configuration Options

Prop	Type	Default
`allowObfuscatedMatch?`	`boolean`	`false`
`wordBoundaries?`	`boolean`	`!allowObfuscatedMatch`
`fuzzyToleranceLevel?`	`number`	`0.8`

When allowObfuscatedMatch is enabled, wordBoundaries is automatically set to false to allow partial word matching within longer strings.

Implementation Examples

JavaScript Implementation

JavaScript Obfuscation Detection

import { Filter } from 'glin-profanity';

// Enable obfuscation detection
const filter = new Filter({
  allowObfuscatedMatch: true,
  languages: ['english']
  // wordBoundaries automatically set to false
});

// Test various obfuscation patterns
console.log(filter.isProfane('sh1t'));        // true - number substitution
console.log(filter.isProfane('f*ck'));        // true - asterisk removal
console.log(filter.isProfane('d@mn'));        // true - symbol substitution  
console.log(filter.isProfane('a$$hole'));     // true - dollar sign substitution
console.log(filter.isProfane('sh!t'));        // true - exclamation substitution
console.log(filter.isProfane('daaaamn'));     // true - repeated characters

// Detailed analysis
const result = filter.checkProfanity('This sh1t is f*cking annoying!');
console.log(result.containsProfanity);        // true
console.log(result.profaneWords);             // ['sh1t', 'f*cking']
console.log(result.processedText);            // 'This **** is ****ing annoying!'

Advanced Obfuscation Configuration

import { Filter } from 'glin-profanity';

// Advanced obfuscation with fuzzy matching
const advancedFilter = new Filter({
  allowObfuscatedMatch: true,
  fuzzyToleranceLevel: 0.6,    // Lower threshold for more aggressive matching
  severityLevels: true,        // Track EXACT vs FUZZY matches
  replaceWith: '[CENSORED]',   // Custom replacement text
  languages: ['english', 'spanish']
});

// Test complex obfuscation patterns
const testCases = [
  'This is bull$h1t!',         // Mixed substitution
  'F*cking @$$holes!',         // Multiple patterns
  'Daaaamn th@t $ucks!',       // Repeated + substitution
  'Such a b1tch move',         // Number + partial match
  'Holy $h!t that was bad'     // Multiple symbol substitution
];

testCases.forEach(text => {
  const result = advancedFilter.checkProfanity(text);
  console.log(`Text: "${text}"`);
  console.log(`- Contains profanity: ${result.containsProfanity}`);
  console.log(`- Detected words: ${result.profaneWords.join(', ')}`);
  console.log(`- Severity map:`, result.severityMap);
  console.log('---');
});

Python Implementation

Python Obfuscation Detection

from glin_profanity import Filter

# Enable obfuscation detection (note snake_case)
filter_instance = Filter({
    "allow_obfuscated_match": True,
    "languages": ["english"]
    # word_boundaries automatically set to False
})

# Test identical patterns to JavaScript
print(filter_instance.is_profane('sh1t'))        # True - number substitution
print(filter_instance.is_profane('f*ck'))        # True - asterisk removal
print(filter_instance.is_profane('d@mn'))        # True - symbol substitution
print(filter_instance.is_profane('a$$hole'))     # True - dollar substitution
print(filter_instance.is_profane('sh!t'))        # True - exclamation substitution
print(filter_instance.is_profane('daaaamn'))     # True - repeated characters

# Detailed analysis with snake_case results
result = filter_instance.check_profanity('This sh1t is f*cking annoying!')
print(result["contains_profanity"])              # True
print(result["profane_words"])                   # ['sh1t', 'f*cking']
print(result["processed_text"])                  # 'This **** is ****ing annoying!'

Cross-Language Parity Verification

from glin_profanity import Filter

# Verify identical behavior between JS and Python
def test_obfuscation_parity():
    # Same configuration as JavaScript examples
    py_filter = Filter({
        "allow_obfuscated_match": True,
        "fuzzy_tolerance_level": 0.6,
        "severity_levels": True,
        "languages": ["english"]
    })
    
    # Test cases from cross-language parity tests
    test_cases = [
        "This is d*mn annoying",      # Asterisk substitution
        "This is d4mn bad",           # Number substitution (not in standard mapping)
        "This is d@mn terrible",      # Symbol substitution
        "This is daaaammmn bad",      # Repeated characters
        "What the f*ck is this?"      # Real-world usage
    ]
    
    for text in test_cases:
        result = py_filter.check_profanity(text)
        print(f'Text: "{text}"')
        print(f'- Detected: {result["contains_profanity"]}')
        print(f'- Words: {result["profane_words"]}')
        if "severity_map" in result:
            print(f'- Severity: {result["severity_map"]}')
        print('---')

test_obfuscation_parity()

Detection Algorithm

The obfuscation detection follows a two-step normalization process:

Word Boundary Behavior

Obfuscation detection automatically disables word boundaries to catch disguised profanity within larger words.

Automatic Configuration

When allowObfuscatedMatch is enabled, the system automatically adjusts related settings:

Automatic Configuration Adjustment

// JavaScript automatic configuration
const filter = new Filter({
  allowObfuscatedMatch: true,
  // wordBoundaries: automatically set to false
  // fuzzyToleranceLevel: default 0.8 works well with obfuscation
});

// Python automatic configuration
filter_instance = Filter({
    "allow_obfuscated_match": True,
    # "word_boundaries": automatically set to False
    # "fuzzy_tolerance_level": default 0.8 works well
})

Impact on Detection

Word Boundaries Enabled (Default)

const filter = new Filter({
  allowObfuscatedMatch: false,  // Default
  wordBoundaries: true          // Default
});

// Only detects whole words
console.log(filter.isProfane('damn'));       // true - whole word
console.log(filter.isProfane('damnit'));     // false - part of larger word
console.log(filter.isProfane('goddamn'));    // false - part of compound word

Word Boundaries Disabled (Obfuscation Mode)

const filter = new Filter({
  allowObfuscatedMatch: true,   // Enables obfuscation
  // wordBoundaries: false      // Automatically disabled
});

// Detects partial matches and obfuscated patterns
console.log(filter.isProfane('damn'));       // true - whole word
console.log(filter.isProfane('d@mn'));       // true - obfuscated
console.log(filter.isProfane('damnit'));     // true - partial match
console.log(filter.isProfane('goddamn'));    // true - partial match
console.log(filter.isProfane('godd@mn'));    // true - partial + obfuscated

Common Obfuscation Patterns

Symbol Substitution

Symbol-Based Obfuscation Examples

// Common patterns caught by the system
const patterns = [
  // @ substitution
  'd@mn', 'b@stard', 'f@ck', '@ss', '@sshole',
  
  // $ substitution  
  '$hit', 'bull$hit', 'a$$', 'a$$hole', 'ba$tard',
  
  // ! substitution
  'sh!t', 'b!tch', 'p!ss', 'damn!t',
  
  // Number substitution
  'sh1t', 'b1tch', 'h3ll', '4ss', 'f4ck',
  
  // Mixed patterns
  'bull$h1t', 'b@$t@rd', '$h1t', 'a$$h0le'
];

const filter = new Filter({ allowObfuscatedMatch: true });
patterns.forEach(pattern => {
  console.log(`${pattern}: ${filter.isProfane(pattern)}`); // All return true
});

Repeated Characters

Repeated Character Examples

// Excessive repetition patterns
const repeatedPatterns = [
  'daaaamn',      // damn with extra a's
  'shiiiit',      // shit with extra i's  
  'fuuuuck',      // fuck with extra u's
  'helllll',      // hell with extra l's
  'bitttch',      // bitch with extra t's
  'asssss'        // ass with extra s's
];

const filter = new Filter({ allowObfuscatedMatch: true });
repeatedPatterns.forEach(pattern => {
  const result = filter.checkProfanity(pattern);
  console.log(`"${pattern}" → normalized and detected: ${result.containsProfanity}`);
});

Asterisk Removal

Asterisk Pattern Handling

// Asterisk patterns (removed then fuzzy matched)
const asteriskPatterns = [
  'f*ck',         // fuck with middle asterisk
  'f**k',         // fuck with multiple asterisks
  'sh*t',         // shit with asterisk
  'b*tch',        // bitch with asterisk  
  'd*mn',         // damn with asterisk
  'a**hole'       // asshole with asterisks
];

const filter = new Filter({ 
  allowObfuscatedMatch: true,
  severityLevels: true  // Track FUZZY vs EXACT matches
});

asteriskPatterns.forEach(pattern => {
  const result = filter.checkProfanity(pattern);
  console.log(`"${pattern}" → ${result.containsProfanity} (severity: FUZZY)`);
});

Performance Considerations

Processing Overhead

Obfuscation detection adds processing overhead due to text normalization and fuzzy matching. Enable only when disguised profanity is a concern.

Performance Comparison

// Performance impact (approximate)
const basicFilter = new Filter();              // ~0.1ms per check
const obfuscatedFilter = new Filter({          // ~0.3-0.8ms per check
  allowObfuscatedMatch: true
});

// Optimization strategies
const optimizedFilter = new Filter({
  allowObfuscatedMatch: true,
  fuzzyToleranceLevel: 0.8,     // Higher = less aggressive fuzzy matching
  languages: ['english']        // Limit to specific languages
});

Memory Usage

Character Map: ~0.1KB for substitution rules
Normalization Buffer: ~2x input text size during processing
Regex Compilation: Additional patterns for fuzzy matching

Total Overhead: Minimal static memory, temporary allocation during processing

// JavaScript
const jsFilter = new Filter({ allowObfuscatedMatch: true });

// Python  
filter_instance = Filter({"allow_obfuscated_match": True})

// Both implementations return identical results for:
const testCases = [
  'sh1t', 'f*ck', 'd@mn', 'a$$hole', 'daaaamn',
  'bull$h1t', 'f**king', 'sh!thead', 'b1tch'
];

// Cross-language testing ensures 100% parity

Cross-References

Context Analysis - Combine with context awareness for better accuracy
Filter Class - Object-oriented API with obfuscation support
Configuration - Complete configuration options
Python API - Cross-language implementation details

Obfuscation Detection

Character Substitution Mapping

Substitution Rules

Repeated Character Normalization

Configuration Options

Implementation Examples

JavaScript Implementation

Python Implementation

Detection Algorithm

Word Boundary Behavior

Automatic Configuration

Impact on Detection

Common Obfuscation Patterns

Symbol Substitution

Repeated Characters

Asterisk Removal

Performance Considerations

Processing Overhead

Memory Usage

Best Practices

When to Enable Obfuscation Detection

Cross-Language Implementation

Implementation Parity

Cross-References

On this page

Obfuscation Detection

Step 1: Repeated Character Reduction

Step 2: Character Substitution

Step 3: Fuzzy Matching Integration

High-Risk Environments

Balanced Filtering

Performance-Critical Systems

On this page