The idea in one sentence
Some detection happens on your device. Some on ours. None of it sticks anywhere.Tuteliq publishes a small, public lexicon of high-precision phrase patterns mapped to category flags. Every Tuteliq SDK fetches it on startup, caches it by version, and runs it client-side against content before making an API call. Two payoffs:
- Benign-only matches → no API call needed. “Hi!” / “thank you” / “good night” never leave the device.
- Positive matches → structured hints on the request. When the SDK detects a high-precision signal (e.g.
SECRECY_REQUEST), it attaches the match to the API request so the server starts with a strong prior.
Endpoint
ETag and Cache-Control: public, max-age=3600 so SDKs that send If-None-Match on subsequent fetches will get a 304 Not Modified if the lexicon hasn’t changed. Lexicon version (LEXICON_VERSION) bumps on any entry change.
Response shape
Matching contract
SDKs in any language re-implement the same matching contract:| Field | Behaviour |
|---|---|
type: "phrase" | Case-insensitive substring match: text.toLowerCase().includes(pattern.toLowerCase()) |
type: "regex" | JavaScript-style regex compiled with the i flag, tested against the original text |
languages | ["*"] matches any language; otherwise an ISO 639-1 lowercase match (en, es, pt, …). Default if the SDK doesn’t know the language: assume en. |
endpoints | ["*"] applies to all; otherwise the SDK only runs entries scoped to the target endpoint (grooming, distress-signals, bullying, coercive-control, unsafe). |
weight: -1 | Benign signal. If only benign signals match (no positive), SDK MAY skip the API call. |
weight: 0..1 | Positive signal. SDK MUST attach the match to the API request as a prescreen_flags item. |
Result shape
SDK behaviour (reference)
Privacy properties
- The lexicon is public — anyone can fetch and inspect it. No model weights, no proprietary scoring.
- The matching is local — content matched on-device never leaves the device unless the SDK decides to make an API call.
- Benign short-circuits — clearly-benign content never reaches Tuteliq at all.
- Server-side acceptance is opt-in — passing
prescreen_flagson a detection request is supported and used as a prior; it never substitutes for the full server-side analysis.
What’s NOT in v1
- Multi-language entries beyond English — coming in subsequent lexicon versions. SDKs that don’t recognise the language fall back to
enpatterns; we plan to shipes,pt,fr,denext. - Machine-learned patterns — v1 is curated, hand-written phrases focused on precision-per-pattern. The bar to add an entry is intentionally high — false-positive benign matches would cause SDKs to silently drop concerning content.
- Per-customer custom lexicons — every SDK runs the same public lexicon. Custom patterns are a P3 follow-up.
Versioning
Theversion field (e.g. 2026.05.001) bumps on any change to entries. SDKs cache by version. Old SDK versions keep working with their cached lexicon — they’re just less sharp until the next SDK release pulls the new one. Two-week target cadence for additions; faster for removals.
If you spot a false positive (benign content matching as a positive signal) please report it — those are the highest-priority lexicon fixes.