The ML toxicity model detects 7 distinct categories of harmful content. Understanding these labels helps you configure detection thresholds and handle different types of toxicity appropriately.

Overview

Prop	Type	Default
`toxicity?`	`Catch-all for harmful content`	-
`insult?`	`Direct attacks on individuals`	-
`threat?`	`Physical or implied threats`	-
`identity_attack?`	`Race, religion, gender, etc.`	-
`obscene?`	`Explicit language without target`	-
`severe_toxicity?`	`Extreme cases`	-
`sexual_explicit?`	`Adult content`	-

Detailed Category Breakdown

Choosing Labels for Your Use Case

Chat/Gaming Moderation

const detector = new ToxicityDetector({
  labels: ['insult', 'threat', 'toxicity'],
  threshold: 0.8,
});

Professional/Workplace

const detector = new ToxicityDetector({
  labels: ['insult', 'identity_attack', 'sexual_explicit', 'threat'],
  threshold: 0.85,
});

Family-Friendly Platform

const detector = new ToxicityDetector({
  labels: ToxicityDetector.ALL_LABELS, // All 7 categories
  threshold: 0.8,
});

Safety-Critical (Threats Focus)

const detector = new ToxicityDetector({
  labels: ['threat', 'severe_toxicity'],
  threshold: 0.7, // Lower threshold for safety
});

Handling Different Categories

import { ToxicityDetector } from 'glin-profanity/ml';

async function handleByCategory(text: string) {
  const detector = new ToxicityDetector({ threshold: 0.8 });
  const result = await detector.analyze(text);

  if (!result.isToxic) {
    return { action: 'allow' };
  }

  // Handle by severity
  if (result.matchedCategories.includes('threat') ||
      result.matchedCategories.includes('severe_toxicity')) {
    return {
      action: 'block_and_report',
      escalate: true,
      reason: 'Potential threat detected',
    };
  }

  if (result.matchedCategories.includes('identity_attack')) {
    return {
      action: 'block',
      warn: true,
      reason: 'Hate speech policy violation',
    };
  }

  if (result.matchedCategories.includes('insult')) {
    return {
      action: 'warn',
      reason: 'Please keep discussions respectful',
    };
  }

  // Default handling
  return {
    action: 'flag_for_review',
    categories: result.matchedCategories,
  };
}

Threshold Guidelines

Category	Conservative	Balanced	Sensitive
toxicity	0.9	0.85	0.75
insult	0.9	0.85	0.8
threat	0.8	0.7	0.6
identity_attack	0.85	0.8	0.75
obscene	0.9	0.85	0.8
severe_toxicity	0.95	0.9	0.85
sexual_explicit	0.9	0.85	0.8

Start with balanced thresholds and adjust based on your false positive/negative rates in production.

Model Limitations

English-focused: The model is trained primarily on English text
Context gaps: May miss sarcasm or context-dependent meaning
Cultural bias: Training data reflects Western internet culture
New slang: May not catch recently coined terms

For comprehensive coverage, combine ML with rule-based detection using HybridFilter.

Cross-References

ToxicityDetector API — API reference
HybridFilter — Combined ML + rules
ML Integration Guide — Best practices

Toxicity Labels

Overview

Detailed Category Breakdown

Choosing Labels for Your Use Case

Chat/Gaming Moderation

Professional/Workplace

Family-Friendly Platform

Safety-Critical (Threats Focus)

Handling Different Categories

Threshold Guidelines

Model Limitations

Cross-References

On this page

Toxicity Labels

toxicity

insult

threat

identity_attack

obscene

severe_toxicity

sexual_explicit

On this page