The detection pipeline
1. Content ingestion & language detection
When you send a request to any Safety endpoint, Tuteliq first normalizes the input and detects the content language. Text is analyzed directly. Audio files are transcribed via Whisper and then analyzed as text with timestamped segments preserved. Images are processed through Vision AI for visual classification and OCR text extraction simultaneously — so a screenshot of a harmful conversation is caught by both the visual and textual classifiers. Language detection uses a layered approach for maximum reliability:- Explicit code — If you pass a
languageparameter, it is used directly. - Trigram detection — If no explicit language is given, the API runs trigram-based analysis on the input text.
- LLM confirmation — The LLM also identifies the content language during analysis. When the LLM’s detection is a supported language, it takes precedence — this ensures correct detection for closely related languages like Norwegian, Swedish, and Danish.
2. Multi-model classification
Rather than relying on a single model, Tuteliq runs content through specialized classifiers for each harm category in parallel:| Classifier | What it detects |
|---|---|
| Grooming Detection | Trust escalation, secrecy requests, isolation attempts, boundary testing, gift/reward patterns |
| Bullying & Harassment | Direct insults, social exclusion, intimidation, cyberstalking, identity-based attacks |
| Self-Harm & Suicidal Ideation | Crisis language, passive ideation, planning indicators, self-injury references |
| Substance Use | Promotion, solicitation, normalization of drug/alcohol use toward minors |
| Eating Disorders | Pro-anorexia/bulimia content, body dysmorphia triggers, dangerous diet promotion |
| Depression & Anxiety | Persistent mood indicators, hopelessness patterns, withdrawal signals |
| Compulsive Usage | Engagement manipulation, addiction-pattern reinforcement, dark patterns targeting minors |
| Sexual Exploitation | Explicit solicitation, sextortion patterns, inappropriate sexual content directed at minors |
| Social Engineering | Pretexting, impersonation, urgency manipulation, authority exploitation targeting minors |
| App Fraud | Fake app promotion, malicious download links, clone app distribution, fraudulent reviews |
| Romance Scam | Love-bombing, financial requests, identity fabrication, isolation from support networks |
| Mule Recruitment | Easy money offers, account sharing requests, laundering language, recruitment pressure |
| Gambling Harm | Underage gambling promotion, addiction patterns, predatory odds, bet pressure tactics |
| Coercive Control | Isolation tactics, financial control, monitoring/surveillance, threat patterns in relationships |
| Vulnerability Exploitation | Targeting based on age, disability, emotional state, or financial hardship — with cross-endpoint vulnerability profiling |
| Radicalisation | Extremist rhetoric, us-vs-them framing, recruitment patterns, dehumanisation of outgroups |
| Age Verification | Document-based age verification, biometric age estimation, age assurance for platform compliance Beta — available on Pro tier and above |
| Identity Verification | Document verification, liveness detection, identity confirmation to prevent impersonation Beta — available on Business tier and above |
/analyse/multi endpoint lets you run up to 10 classifiers on a single piece of content in one API call. When vulnerability exploitation detection is included, it produces a cross-endpoint vulnerability modifier that automatically adjusts severity scores across all other results — amplifying risk when the content targets vulnerable individuals.
Valid endpoint values for /analyse/multi:
| Endpoint ID | Classifier |
|---|---|
bullying | Bullying & Harassment |
grooming | Grooming Detection |
unsafe | Unsafe Content (KOSA categories) |
social-engineering | Social Engineering |
app-fraud | App-based Fraud |
romance-scam | Romance Scam |
mule-recruitment | Mule Recruitment |
gambling-harm | Gambling Harm |
coercive-control | Coercive Control |
vulnerability-exploitation | Vulnerability Exploitation |
radicalisation | Radicalisation |
3. Context engine
This is where Tuteliq diverges from keyword-based filters. The context engine evaluates:- Linguistic intent — Is “I want to kill myself” an expression of frustration over a video game, or a genuine crisis signal? Tuteliq analyzes surrounding context, tone, and conversational history to distinguish the two.
- Relationship dynamics — A single message may appear harmless. The context engine tracks multi-turn escalation patterns — compliments, then secrecy requests, then isolation attempts, then boundary violations — that only become visible across a conversation. Every conversation-aware endpoint returns a
message_analysisarray that shows exactly how risk escalates message by message, with individual risk scores and detected tactics for each entry. - Platform norms — Teen slang, gaming culture, and social media language evolve fast. The context engine recognizes that “I’m literally dead” in a group chat has a fundamentally different risk profile than the same phrase in a private message to a younger child.
4. Age-calibrated scoring
The same content carries different risk depending on the child’s age. Tuteliq adjusts severity across four brackets:| Age bracket | Calibration |
|---|---|
| Under 10 | Highest sensitivity. Almost any exposure to harmful content is flagged at elevated severity. |
| 10–12 | High sensitivity. Beginning to encounter peer conflict; distinguishes normal friction from targeted harassment. |
| 13–15 | Moderate sensitivity. Accounts for typical teen communication patterns while remaining alert to genuine risk. |
| 16–17 | Adjusted sensitivity. Recognizes greater autonomy while maintaining protection against grooming, exploitation, and crisis signals. |
age_group in your request context. If omitted, Tuteliq defaults to the most protective bracket.
5. Response generation
Context fields
You can pass acontext object with any detection request to improve accuracy:
| Field | Type | Effect |
|---|---|---|
age_group / ageGroup | string | Triggers age-calibrated scoring (e.g., "10-12", "13-15", "under 18") |
language | string | ISO 639-1 code. Auto-detected if omitted. |
platform | string | Platform name (e.g., "Discord", "Roblox"). Adjusts for platform-specific norms. |
conversation_history | array | Prior messages for context-aware analysis. Returns per-message message_analysis. |
sender_trust | string | "verified", "trusted", or "unknown". Verified senders suppress impersonation flags. |
sender_name | string | Sender identifier (used with sender_trust). |
country | string | ISO 3166-1 alpha-2 code (e.g., "GB", "US", "SE"). Enables geo-localised crisis helpline data. Falls back to user profile country if omitted. |
When
sender_trust is "verified", the API fully suppresses AUTH_IMPERSONATION — a verified sender cannot be impersonating an authority by definition. Routine urgency (schedules, deadlines) is also suppressed. Only genuinely malicious content (credential theft, phishing links, financial demands) will be flagged.Crisis support resources (support_threshold)
Detection responses can include country-specific crisis helplines and response guidance. The support_threshold parameter controls when these are included:
| Value | Behavior |
|---|---|
low | Include for Low severity and above |
medium | Include for Medium severity and above |
high | (Default) Include for High severity and above |
critical | Include only for Critical severity |
Critical severity always includes support resources regardless of the threshold setting.
support_threshold in the options object or as a top-level request field:
unsafe(boolean, legacy endpoints) ordetected(boolean, new detection endpoints) — Clear yes/no for immediate routing decisions. Legacy endpoints returnunsafe; newer detection endpoints usedetectedinstead.categories(array) — Which KOSA harm categories were triggered.severity(string) —low,medium,high, orcritical, calibrated to the age group.risk_score(float, 0.0–1.0) — Granular score for threshold-based automation.confidence(float) — Model confidence in the classification.rationale(string) — Human-readable explanation of why the content was flagged. Useful for trust & safety review and audit trails.message_analysis(array, conversation-aware endpoints) — Per-message risk breakdown, returned whenconversation_historyis provided. Each entry containsmessage_index,risk_score,flags, andsummary, making the escalation sequence visible for dashboards and reporting. Available on grooming, social engineering, app fraud, romance scam, mule recruitment, gambling harm, coercive control, vulnerability exploitation, and radicalisation endpoints.recommended_action(string) — Suggested next step, such as “Escalate to counselor” or “Block and report.”language(string) — Resolved language code (ISO 639-1) used for analysis, auto-detected or explicit.language_status(string) —"stable"for English,"beta"for all other supported languages.
Beyond detection
Tuteliq doesn’t stop at “this content is unsafe.” Two additional endpoints complete the workflow:Action plan generation
The/guidance/action-plan endpoint takes a detection result and generates age-appropriate guidance tailored to the audience:
- For children — Gentle, reading-level-appropriate language explaining what happened and what to do next.
- For parents — Clear explanation of the detected risk with suggested conversations and resources.
- For trust & safety teams — Technical summary with recommended platform actions and escalation paths.
Incident reports
The/reports/incident endpoint converts raw conversation data into structured, professional reports suitable for school counselors responding to bullying incidents, platform moderators documenting patterns of abuse, and compliance teams maintaining audit trails for KOSA reporting.
Architecture principles
Fully stateless. Every API call is independent — Tuteliq never stores conversation text, context, or session state between requests. This is a deliberate privacy-by-design decision: when processing children’s data under GDPR/COPPA, the safest data is data you never store. Pass conversation history with each request that needs it; results are returned and content is discarded. No training on your data. Content sent to Tuteliq is used solely for real-time analysis and is not retained for model training. See the GDPR section for data retention details. Parallel processing. All harm classifiers run simultaneously, not sequentially. This is how Tuteliq maintains sub-400ms response times even when checking against all nine KOSA categories. Policy-configurable. Use the/policy/ endpoint to adjust detection thresholds, category weights, and moderation rules for your specific use case — without changing your integration code.
Next steps
Quickstart
Make your first detection call in under 5 minutes.
KOSA Compliance
See how each harm category maps to regulatory requirements.