GLINR Studio LogoTypeWeaver

Context Analysis

Advanced context-aware filtering with sentiment analysis and domain-specific whitelisting

Edit on GitHub

Advanced context-aware profanity filtering that analyzes surrounding text to reduce false positives and understand intent. This sophisticated NLP system examines sentiment, phrase patterns, and domain-specific contexts to make intelligent filtering decisions.

Context analysis dramatically improves accuracy by distinguishing between "This movie is fucking amazing!" (positive context) vs "You fucking idiot!" (negative context).

How Context Analysis Works

Text Tokenization and Window Extraction

The analyzer first tokenizes the input text and extracts a configurable window of words around each profanity match.

// Example: "This movie is fucking amazing and incredible!"
// Match: "fucking" at word index 3
// contextWindow: 3
// Extracted context: ["this", "movie", "is", "fucking", "amazing", "and", "incredible"]

Implementation Details:

  • Tokenization removes punctuation and normalizes case
  • Window size is configurable (default: 3 words before/after)
  • Character-to-word mapping handles complex text structures

Phrase Pattern Detection

The system checks for predefined positive and negative phrase patterns that contain profanity but have clear contextual meaning.

// Positive phrases (high confidence scores)
const POSITIVE_PHRASES = new Map([
  ['the bomb', 0.9],        // "this movie is the bomb"
  ['da bomb', 0.9],         // slang for "the best" 
  ['photo bomb', 0.8],      // photography term
  ['bath bomb', 0.8],       // cosmetic product
]);

// Negative phrases (low confidence scores)
const NEGATIVE_PHRASES = new Map([
  ['you are', 0.1],         // "you are [profanity]"
  ['ur a', 0.1],           // "ur a [profanity]"
  ['such a', 0.2],         // "such a [profanity]"
]);

Pattern Matching Process:

  • Exact phrase matching takes precedence over sentiment analysis
  • Phrases return predetermined confidence scores (0.0-1.0)
  • Covers common expressions, slang, and technical terms

Domain-Specific Whitelisting

Context analyzer checks for domain-specific positive indicators that suggest non-offensive usage.

// Gaming context indicators
const GAMING_POSITIVE = new Set([
  'player', 'gamer', 'team', 'squad', 'boss', 'raid', 'quest',
  'achievement', 'skill', 'build', 'strategy', 'level', 'match'
]);

// Custom domain whitelists
const domainWhitelists = {
  english: ['enemy', 'character', 'weapon', 'item', 'spell']
};

Whitelist Logic:

  • If ANY whitelist word appears in context window → confidence score 0.8
  • Combines built-in gaming terms with custom domain words
  • Immediate positive classification without sentiment analysis

Distance-Weighted Sentiment Scoring

Advanced sentiment analysis with proximity-based weighting for nuanced context understanding.

// Sentiment calculation algorithm
function calculateSentimentScore(contextWords, matchPosition) {
  let positiveCount = 0;
  let negativeCount = 0;
  
  // Weight words by distance from profanity match
  for (let i = 0; i < contextWords.length; i++) {
    const distance = Math.abs(i - matchPosition);
    const weight = Math.max(0.1, 1 - (distance * 0.2));
    
    if (POSITIVE_INDICATORS.has(word)) {
      positiveCount += weight;
    } else if (NEGATIVE_INDICATORS.has(word)) {
      negativeCount += weight;
    }
  }
  
  return positiveCount / (positiveCount + negativeCount);
}

Scoring Features:

  • Distance Weighting: Words closer to profanity have higher influence
  • Personal Pronoun Detection: "you", "your" reduce positive scores
  • Object Reference Detection: "movie", "game", "this" boost positive scores
  • Neutral Handling: Returns 0.5 when no sentiment indicators found

Configuration Options

contextWindow

Controls how many words before and after the profanity match to analyze for context.

const config = {
  enableContextAware: true,
  contextWindow: 3,  // Default: 3 words before/after
};

// contextWindow: 2
// "This damn movie" → Context: ["this", "damn", "movie"]
// contextWindow: 5  
// "This damn good movie rocks" → Context: ["this", "damn", "good", "movie", "rocks"]

Recommendations:

  • Small (1-2): Fast processing, less context
  • Medium (3-4): Balanced accuracy and performance
  • Large (5-7): Maximum context, slower processing

confidenceThreshold

Sets the minimum confidence score required to override profanity detection.

const config = {
  enableContextAware: true,
  confidenceThreshold: 0.7,  // Default: 0.7 (70% confidence)
};

// Example scoring:
// "This movie is fucking amazing!" → contextScore: 0.85 → NOT flagged (>0.7)
// "You fucking idiot!" → contextScore: 0.15 → FLAGGED (<0.7)
// "That's damn good" → contextScore: 0.65 → FLAGGED (<0.7)

Threshold Guidelines:

  • Strict (0.8-0.9): High confidence required, fewer false negatives
  • Balanced (0.6-0.8): Good balance of accuracy
  • Lenient (0.4-0.6): More permissive, more false negatives possible

Sentiment Indicators

Positive Indicators

Words that suggest positive context and reduce profanity likelihood:

const POSITIVE_ENTHUSIASM = [
  'amazing', 'awesome', 'excellent', 'fantastic', 'great', 
  'wonderful', 'brilliant', 'perfect', 'incredible', 'outstanding',
  'superb', 'magnificent', 'spectacular', 'phenomenal', 'terrific'
];
const POSITIVE_QUALITY = [
  'best', 'good', 'nice', 'cool', 'sweet', 'epic', 'legendary',
  'godlike', 'insane', 'crazy', 'wild', 'beast', 'fire', 'lit'
];
const POSITIVE_SLANG = [
  'rad', 'sick', 'dope', 'fire', 'lit', 'epic', 'legendary',
  'godlike', 'insane', 'crazy', 'wild', 'beast'
];
const POSITIVE_OBJECTS = [
  'movie', 'film', 'show', 'song', 'music', 'game', 'book',
  'restaurant', 'food', 'dish', 'meal', 'place', 'experience'
];

Negative Indicators

Words that suggest negative context and increase profanity likelihood:

const NEGATIVE_INDICATORS = [
  // Direct insults
  'hate', 'stupid', 'idiot', 'moron', 'loser', 'worthless', 'pathetic',
  
  // Quality descriptors
  'terrible', 'awful', 'horrible', 'disgusting', 'garbage', 'trash',
  'worst', 'bad', 'ugly', 'gross', 'nasty', 'lame', 'weak',
  
  // Personal pronouns (context-dependent)
  'you', 'your', 'yourself', 'u', 'ur', 'ure', 'youre'
];

Domain Whitelisting

Real-World Examples

Positive Context Detection

const filter = new Filter({
  enableContextAware: true,
  contextWindow: 4,
  confidenceThreshold: 0.7
});

// Entertainment context
const result1 = filter.checkProfanity("This movie is fucking amazing!");
console.log(result1.containsProfanity);  // false
console.log(result1.contextScore);       // 0.85
console.log(result1.reason);             // "Positive context detected"

// Gaming context with whitelist
const result2 = filter.checkProfanity("That boss fight was badass!");
console.log(result2.containsProfanity);  // false  
console.log(result2.contextScore);       // 0.8
console.log(result2.reason);             // "Domain-specific whitelist match"

// Product review context
const result3 = filter.checkProfanity("This product is the bomb!");
console.log(result3.containsProfanity);  // false
console.log(result3.contextScore);       // 0.9
console.log(result3.reason);             // "Positive phrase detected: 'the bomb'"

Negative Context Detection

// Personal attack context
const result4 = filter.checkProfanity("You fucking idiot!");
console.log(result4.containsProfanity);  // true
console.log(result4.contextScore);       // 0.15
console.log(result4.reason);             // "Negative context detected"

// Hostile language
const result5 = filter.checkProfanity("That's such a damn stupid idea");
console.log(result5.containsProfanity);  // true
console.log(result5.contextScore);       // 0.25
console.log(result5.reason);             // "Negative context detected"

Context Score Interpretation

Score RangeInterpretationActionExample
0.0 - 0.3Strong negative contextFlag as profanity"You fucking moron!"
0.3 - 0.6Weak/neutral contextDepends on threshold"This damn thing broke"
0.6 - 0.8Positive contextLikely not profanity"Damn good coffee!"
0.8 - 1.0Strong positive contextNot profanity"This movie is fucking brilliant!"

Performance Considerations

Context analysis adds computational overhead. Enable only when accuracy is more important than speed.

Performance Impact

// Performance comparison (approximate)
const basicFilter = new Filter();  // ~0.1ms per check
const contextFilter = new Filter({
  enableContextAware: true,        // ~0.5-2ms per check
  contextWindow: 3
});

// Optimization strategies
const optimizedFilter = new Filter({
  enableContextAware: true,
  contextWindow: 2,                // Smaller window = faster
  confidenceThreshold: 0.8,        // Higher threshold = fewer analyses
  domainWhitelists: {              // Targeted whitelists
    english: ['game', 'movie', 'show']  // Only essential terms
  }
});

Memory Usage

  • Sentiment Indicators: ~2KB for positive/negative word sets
  • Gaming Whitelist: ~1KB for gaming-specific terms
  • Phrase Patterns: ~0.5KB for positive/negative phrases
  • Per Analysis: ~0.1KB temporary context extraction

Total Memory Overhead: ~4KB static + minimal per-analysis allocation

Cross-References