Supported Languages
| Code | Language | Status | Notes |
|---|---|---|---|
en | English | Stable | Full production support |
es | Spanish | Beta | Including Latin American variants |
pt | Portuguese | Beta | Including Brazilian Portuguese |
uk | Ukrainian | Beta | |
sv | Swedish | Beta | |
no | Norwegian | Beta | Bokmål and Nynorsk |
da | Danish | Beta | |
fi | Finnish | Beta | |
de | German | Beta | |
fr | French | Beta | |
nl | Dutch | Beta | Including Flemish |
pl | Polish | Beta | |
it | Italian | Beta | |
tr | Turkish | Beta | |
ro | Romanian | Beta | |
el | Greek | Beta | Greeklish (Latin-alphabet) also recognized |
cs | Czech | Beta | |
hu | Hungarian | Beta | |
bg | Bulgarian | Beta | Cyrillic and Shlyokavitsa (Latin-alphabet) |
hr | Croatian | Beta | Also covers Serbian and Bosnian content |
sk | Slovak | Beta | |
lt | Lithuanian | Beta | |
lv | Latvian | Beta | |
et | Estonian | Beta | |
sl | Slovenian | Beta | |
mt | Maltese | Beta | Heavy English-Maltese code-switching supported |
ga | Irish | Beta | Explicit language: "ga" recommended for best results |
Stable means fully validated with comprehensive test coverage. Beta languages are production-ready but may have slightly lower accuracy on edge cases. All beta languages include culture-specific analysis guidelines.
How Detection Works
Language detection uses a three-layer approach for maximum reliability:Explicit code
If you pass a
language parameter in the request context, it is used directly. This is the fastest path and guarantees the correct language is used.Trigram detection
If no explicit language is given, the API runs trigram-based analysis on the input text to infer the language. This works well for most languages and requires no extra latency.
LLM confirmation
The LLM also identifies the content language during its analysis (at zero extra cost — same API call). When the LLM’s detection is a supported language, it takes precedence over the trigram result. This ensures correct detection for closely related languages like Norwegian/Swedish/Danish.
Response Fields
Every safety endpoint response includes language information:| Field | Type | Description |
|---|---|---|
language | string | Final resolved language code (ISO 639-1) used for analysis |
language_status | string | "stable" for English, "beta" for all other supported languages |
detected_language | string | Language code reported by the LLM |
Culture-Aware Analysis
Each supported language includes culture-specific guidelines that are injected into the classification prompt:- Local slang and idioms — Ensures teen slang and cultural expressions are correctly interpreted rather than triggering false positives.
- Harmful terms — Language-specific lists of slurs, hate speech, and harmful terminology.
- Grooming indicators — Language-specific grooming patterns, including pronoun formality shifts, culturally-specific pet names, and platform preferences by region.
- Self-harm coded vocabulary — Coded phrases for self-harm and suicidal ideation in each language, beyond literal translations.
- Filter evasion techniques — Language-specific evasion patterns: diacritic omission, Cyrillic-Latin homoglyph mixing (Bulgarian), Greeklish/Shlyokavitsa (Greek/Bulgarian Latin-alphabet writing), code-switching between related languages.
- Cultural context — For example, Finnish profanity (e.g., “perkele”) is culturally common and treated differently than targeted insults. Norwegian analysis accounts for the janteloven cultural norm. Danish analysis is calibrated for sarcastic and self-deprecating communication styles. Dutch analysis flags disease-based swearing (e.g., “kanker-” prefix). Turkish analysis considers honor-based dynamics. Baltic and Balkan languages include parental emigration context for vulnerability assessment.