Toxicity Labels
Understanding the 7 toxicity categories detected by ML analysis
The ML toxicity model detects 7 distinct categories of harmful content. Understanding these labels helps you configure detection thresholds and handle different types of toxicity appropriately.
Overview
| Prop | Type | Default |
|---|---|---|
toxicity? | Catch-all for harmful content | - |
insult? | Direct attacks on individuals | - |
threat? | Physical or implied threats | - |
identity_attack? | Race, religion, gender, etc. | - |
obscene? | Explicit language without target | - |
severe_toxicity? | Extreme cases | - |
sexual_explicit? | Adult content | - |
Detailed Category Breakdown
Choosing Labels for Your Use Case
Chat/Gaming Moderation
const detector = new ToxicityDetector({
labels: ['insult', 'threat', 'toxicity'],
threshold: 0.8,
});Professional/Workplace
const detector = new ToxicityDetector({
labels: ['insult', 'identity_attack', 'sexual_explicit', 'threat'],
threshold: 0.85,
});Family-Friendly Platform
const detector = new ToxicityDetector({
labels: ToxicityDetector.ALL_LABELS, // All 7 categories
threshold: 0.8,
});Safety-Critical (Threats Focus)
const detector = new ToxicityDetector({
labels: ['threat', 'severe_toxicity'],
threshold: 0.7, // Lower threshold for safety
});Handling Different Categories
import { ToxicityDetector } from 'glin-profanity/ml';
async function handleByCategory(text: string) {
const detector = new ToxicityDetector({ threshold: 0.8 });
const result = await detector.analyze(text);
if (!result.isToxic) {
return { action: 'allow' };
}
// Handle by severity
if (result.matchedCategories.includes('threat') ||
result.matchedCategories.includes('severe_toxicity')) {
return {
action: 'block_and_report',
escalate: true,
reason: 'Potential threat detected',
};
}
if (result.matchedCategories.includes('identity_attack')) {
return {
action: 'block',
warn: true,
reason: 'Hate speech policy violation',
};
}
if (result.matchedCategories.includes('insult')) {
return {
action: 'warn',
reason: 'Please keep discussions respectful',
};
}
// Default handling
return {
action: 'flag_for_review',
categories: result.matchedCategories,
};
}Threshold Guidelines
| Category | Conservative | Balanced | Sensitive |
|---|---|---|---|
| toxicity | 0.9 | 0.85 | 0.75 |
| insult | 0.9 | 0.85 | 0.8 |
| threat | 0.8 | 0.7 | 0.6 |
| identity_attack | 0.85 | 0.8 | 0.75 |
| obscene | 0.9 | 0.85 | 0.8 |
| severe_toxicity | 0.95 | 0.9 | 0.85 |
| sexual_explicit | 0.9 | 0.85 | 0.8 |
Start with balanced thresholds and adjust based on your false positive/negative rates in production.
Model Limitations
- English-focused: The model is trained primarily on English text
- Context gaps: May miss sarcasm or context-dependent meaning
- Cultural bias: Training data reflects Western internet culture
- New slang: May not catch recently coined terms
For comprehensive coverage, combine ML with rule-based detection using HybridFilter.
Cross-References
- ToxicityDetector API — API reference
- HybridFilter — Combined ML + rules
- ML Integration Guide — Best practices