Context Analysis
Advanced context-aware filtering with sentiment analysis and domain-specific whitelisting
Advanced context-aware profanity filtering that analyzes surrounding text to reduce false positives and understand intent. This sophisticated NLP system examines sentiment, phrase patterns, and domain-specific contexts to make intelligent filtering decisions.
Context analysis dramatically improves accuracy by distinguishing between "This movie is fucking amazing!" (positive context) vs "You fucking idiot!" (negative context).
How Context Analysis Works
Text Tokenization and Window Extraction
The analyzer first tokenizes the input text and extracts a configurable window of words around each profanity match.
// Example: "This movie is fucking amazing and incredible!"
// Match: "fucking" at word index 3
// contextWindow: 3
// Extracted context: ["this", "movie", "is", "fucking", "amazing", "and", "incredible"]Implementation Details:
- Tokenization removes punctuation and normalizes case
- Window size is configurable (default: 3 words before/after)
- Character-to-word mapping handles complex text structures
Phrase Pattern Detection
The system checks for predefined positive and negative phrase patterns that contain profanity but have clear contextual meaning.
// Positive phrases (high confidence scores)
const POSITIVE_PHRASES = new Map([
['the bomb', 0.9], // "this movie is the bomb"
['da bomb', 0.9], // slang for "the best"
['photo bomb', 0.8], // photography term
['bath bomb', 0.8], // cosmetic product
]);
// Negative phrases (low confidence scores)
const NEGATIVE_PHRASES = new Map([
['you are', 0.1], // "you are [profanity]"
['ur a', 0.1], // "ur a [profanity]"
['such a', 0.2], // "such a [profanity]"
]);Pattern Matching Process:
- Exact phrase matching takes precedence over sentiment analysis
- Phrases return predetermined confidence scores (0.0-1.0)
- Covers common expressions, slang, and technical terms
Domain-Specific Whitelisting
Context analyzer checks for domain-specific positive indicators that suggest non-offensive usage.
// Gaming context indicators
const GAMING_POSITIVE = new Set([
'player', 'gamer', 'team', 'squad', 'boss', 'raid', 'quest',
'achievement', 'skill', 'build', 'strategy', 'level', 'match'
]);
// Custom domain whitelists
const domainWhitelists = {
english: ['enemy', 'character', 'weapon', 'item', 'spell']
};Whitelist Logic:
- If ANY whitelist word appears in context window → confidence score 0.8
- Combines built-in gaming terms with custom domain words
- Immediate positive classification without sentiment analysis
Distance-Weighted Sentiment Scoring
Advanced sentiment analysis with proximity-based weighting for nuanced context understanding.
// Sentiment calculation algorithm
function calculateSentimentScore(contextWords, matchPosition) {
let positiveCount = 0;
let negativeCount = 0;
// Weight words by distance from profanity match
for (let i = 0; i < contextWords.length; i++) {
const distance = Math.abs(i - matchPosition);
const weight = Math.max(0.1, 1 - (distance * 0.2));
if (POSITIVE_INDICATORS.has(word)) {
positiveCount += weight;
} else if (NEGATIVE_INDICATORS.has(word)) {
negativeCount += weight;
}
}
return positiveCount / (positiveCount + negativeCount);
}Scoring Features:
- Distance Weighting: Words closer to profanity have higher influence
- Personal Pronoun Detection: "you", "your" reduce positive scores
- Object Reference Detection: "movie", "game", "this" boost positive scores
- Neutral Handling: Returns 0.5 when no sentiment indicators found
Configuration Options
contextWindow
Controls how many words before and after the profanity match to analyze for context.
const config = {
enableContextAware: true,
contextWindow: 3, // Default: 3 words before/after
};
// contextWindow: 2
// "This damn movie" → Context: ["this", "damn", "movie"]
// contextWindow: 5
// "This damn good movie rocks" → Context: ["this", "damn", "good", "movie", "rocks"]Recommendations:
- Small (1-2): Fast processing, less context
- Medium (3-4): Balanced accuracy and performance
- Large (5-7): Maximum context, slower processing
confidenceThreshold
Sets the minimum confidence score required to override profanity detection.
const config = {
enableContextAware: true,
confidenceThreshold: 0.7, // Default: 0.7 (70% confidence)
};
// Example scoring:
// "This movie is fucking amazing!" → contextScore: 0.85 → NOT flagged (>0.7)
// "You fucking idiot!" → contextScore: 0.15 → FLAGGED (<0.7)
// "That's damn good" → contextScore: 0.65 → FLAGGED (<0.7)Threshold Guidelines:
- Strict (0.8-0.9): High confidence required, fewer false negatives
- Balanced (0.6-0.8): Good balance of accuracy
- Lenient (0.4-0.6): More permissive, more false negatives possible
Sentiment Indicators
Positive Indicators
Words that suggest positive context and reduce profanity likelihood:
const POSITIVE_ENTHUSIASM = [
'amazing', 'awesome', 'excellent', 'fantastic', 'great',
'wonderful', 'brilliant', 'perfect', 'incredible', 'outstanding',
'superb', 'magnificent', 'spectacular', 'phenomenal', 'terrific'
];const POSITIVE_QUALITY = [
'best', 'good', 'nice', 'cool', 'sweet', 'epic', 'legendary',
'godlike', 'insane', 'crazy', 'wild', 'beast', 'fire', 'lit'
];const POSITIVE_SLANG = [
'rad', 'sick', 'dope', 'fire', 'lit', 'epic', 'legendary',
'godlike', 'insane', 'crazy', 'wild', 'beast'
];const POSITIVE_OBJECTS = [
'movie', 'film', 'show', 'song', 'music', 'game', 'book',
'restaurant', 'food', 'dish', 'meal', 'place', 'experience'
];Negative Indicators
Words that suggest negative context and increase profanity likelihood:
const NEGATIVE_INDICATORS = [
// Direct insults
'hate', 'stupid', 'idiot', 'moron', 'loser', 'worthless', 'pathetic',
// Quality descriptors
'terrible', 'awful', 'horrible', 'disgusting', 'garbage', 'trash',
'worst', 'bad', 'ugly', 'gross', 'nasty', 'lame', 'weak',
// Personal pronouns (context-dependent)
'you', 'your', 'yourself', 'u', 'ur', 'ure', 'youre'
];Domain Whitelisting
Real-World Examples
Positive Context Detection
const filter = new Filter({
enableContextAware: true,
contextWindow: 4,
confidenceThreshold: 0.7
});
// Entertainment context
const result1 = filter.checkProfanity("This movie is fucking amazing!");
console.log(result1.containsProfanity); // false
console.log(result1.contextScore); // 0.85
console.log(result1.reason); // "Positive context detected"
// Gaming context with whitelist
const result2 = filter.checkProfanity("That boss fight was badass!");
console.log(result2.containsProfanity); // false
console.log(result2.contextScore); // 0.8
console.log(result2.reason); // "Domain-specific whitelist match"
// Product review context
const result3 = filter.checkProfanity("This product is the bomb!");
console.log(result3.containsProfanity); // false
console.log(result3.contextScore); // 0.9
console.log(result3.reason); // "Positive phrase detected: 'the bomb'"Negative Context Detection
// Personal attack context
const result4 = filter.checkProfanity("You fucking idiot!");
console.log(result4.containsProfanity); // true
console.log(result4.contextScore); // 0.15
console.log(result4.reason); // "Negative context detected"
// Hostile language
const result5 = filter.checkProfanity("That's such a damn stupid idea");
console.log(result5.containsProfanity); // true
console.log(result5.contextScore); // 0.25
console.log(result5.reason); // "Negative context detected"Context Score Interpretation
| Score Range | Interpretation | Action | Example |
|---|---|---|---|
| 0.0 - 0.3 | Strong negative context | Flag as profanity | "You fucking moron!" |
| 0.3 - 0.6 | Weak/neutral context | Depends on threshold | "This damn thing broke" |
| 0.6 - 0.8 | Positive context | Likely not profanity | "Damn good coffee!" |
| 0.8 - 1.0 | Strong positive context | Not profanity | "This movie is fucking brilliant!" |
Performance Considerations
Context analysis adds computational overhead. Enable only when accuracy is more important than speed.
Performance Impact
// Performance comparison (approximate)
const basicFilter = new Filter(); // ~0.1ms per check
const contextFilter = new Filter({
enableContextAware: true, // ~0.5-2ms per check
contextWindow: 3
});
// Optimization strategies
const optimizedFilter = new Filter({
enableContextAware: true,
contextWindow: 2, // Smaller window = faster
confidenceThreshold: 0.8, // Higher threshold = fewer analyses
domainWhitelists: { // Targeted whitelists
english: ['game', 'movie', 'show'] // Only essential terms
}
});Memory Usage
- Sentiment Indicators: ~2KB for positive/negative word sets
- Gaming Whitelist: ~1KB for gaming-specific terms
- Phrase Patterns: ~0.5KB for positive/negative phrases
- Per Analysis: ~0.1KB temporary context extraction
Total Memory Overhead: ~4KB static + minimal per-analysis allocation
Cross-References
- Filter Class - Object-oriented API with context analysis
- Core Functions - enableContextAware configuration
- Configuration - Complete context-aware options
- Python API - Cross-language context analysis implementation