Tuteliq provides a WebSocket-based streaming endpoint that transcribes audio in real time, analyzes video frames, and emits safety alerts as they are detected. This allows you to moderate voice chat, video calls, and other live media without waiting for the full recording to finish.
Endpoint
wss://api.tuteliq.ai/api/v1/safety/voice/stream?api_key=YOUR_API_KEY
Authentication is handled via the api_key query parameter or the Authorization: Bearer YOUR_API_KEY header. The connection will be rejected with a 4001 close code if the key is invalid or expired.
Connection Limits
Concurrent WebSocket connections are limited by your plan tier:
| Plan | Max Connections |
|---|
| Starter (Free) | 1 |
| Indie | 3 |
| Pro | 10 |
| Business | 50 |
| Enterprise | Unlimited |
Exceeding your limit returns close code 4029.
Binary Protocol
Send media data as binary WebSocket frames. A prefix byte discriminates between audio and video:
| Prefix Byte | Meaning |
|---|
| (none) | Audio chunk (backward compatible) |
0x00 | Audio chunk (explicit tag) |
0x01 | Video frame (JPEG/PNG) |
The recommended audio format is:
| Parameter | Value |
|---|
| Encoding | PCM 16-bit LE |
| Sample Rate | 16 kHz |
| Channels | 1 (mono) |
| Chunk Size | 4096–32768 bytes |
Other sample rates (8 kHz, 44.1 kHz, 48 kHz) are accepted but will be resampled server-side, which adds latency. 16 kHz mono gives the best balance of accuracy and speed.
Video Frames
Send JPEG or PNG frames with a 0x01 prefix byte. Frames are analyzed at the configured frame_interval_seconds (default 5s, minimum 3s). Each frame must be under 5MB.
// Tag a JPEG frame with the 0x01 prefix
const tagged = Buffer.concat([Buffer.from([0x01]), jpegBuffer]);
ws.send(tagged);
Connection Lifecycle
Connect
Open a WebSocket connection to the streaming endpoint with your API key.
Receive Ready
The server sends a ready event with your session_id and default config.
Configure (optional)
Send a JSON config message to adjust analysis settings.
Stream Media
Send binary audio chunks (and optionally video frames) continuously. The server transcribes audio and analyzes video at configured intervals.
Receive Events
The server sends JSON text frames containing transcriptions, safety alerts, and video frame analysis results.
End Session
Send a {"type": "end"} message or close the connection. The server flushes remaining data and sends a session_summary event.
Client Messages (JSON)
Config
Update session settings at any time. The server responds with a config_updated event.
{
"type": "config",
"interval_seconds": 10,
"analysis_types": ["bullying", "unsafe", "grooming", "emotions"],
"enable_video": true,
"frame_interval_seconds": 5,
"context": {
"age_group": "11-13",
"platform": "Discord",
"language": "en"
}
}
| Field | Type | Default | Description |
|---|
interval_seconds | number | 10 | Audio flush interval in seconds (5–30). |
analysis_types | string[] | all | Safety categories to analyze: bullying, unsafe, grooming, emotions. |
enable_video | boolean | false | Enable video frame analysis. |
frame_interval_seconds | number | 5 | Minimum interval between video frame analyses (min 3s). |
context | object | — | Analysis context: age_group, platform, language, child_age. |
End
Gracefully close the session. The server flushes remaining data and sends a summary.
Server Events
ready
Sent immediately after authentication succeeds.
{
"type": "ready",
"session_id": "abc123",
"config": {
"interval_seconds": 10,
"analysis_types": ["bullying", "unsafe", "grooming", "emotions"],
"enable_video": false,
"frame_interval_seconds": 5
}
}
transcription
Emitted after each audio flush with transcribed text and timestamped segments.
{
"type": "transcription",
"text": "hey do you want to come over to my place after school",
"segments": [
{ "start": 0.0, "end": 2.5, "text": "hey do you want to" },
{ "start": 2.5, "end": 5.1, "text": "come over to my place after school" }
],
"flush_index": 3
}
alert
Safety concern detected in the latest audio flush.
{
"type": "alert",
"category": "grooming",
"severity": "high",
"risk_score": 0.87,
"details": {
"is_grooming": true,
"flags": ["secrecy_request", "private_meeting"],
"confidence": 0.91,
"rationale": "Potential grooming pattern: solicitation of a private meeting with secrecy directive.",
"recommended_action": "immediate_intervention"
},
"flush_index": 3
}
Categories: bullying, unsafe, grooming, emotions, visual.
frame_analysis
Emitted for each analyzed video frame when enable_video is true.
{
"type": "frame_analysis",
"session_id": "abc123",
"frame_index": 5,
"timestamp": "2026-02-17T10:30:15.000Z",
"vision": {
"extracted_text": "",
"visual_categories": ["violence"],
"visual_severity": "high",
"visual_confidence": 0.88,
"visual_description": "Frame depicts violent content",
"contains_text": false,
"contains_faces": true
},
"risk_score": 0.8,
"severity": "high"
}
session_summary
Sent when the session ends (via end message or connection close).
{
"type": "session_summary",
"session_id": "abc123",
"duration_seconds": 120,
"overall_risk": "medium",
"overall_risk_score": 0.55,
"total_flushes": 12,
"transcript": "Full concatenated transcript of the session...",
"video_frames_analyzed": 8
}
config_updated
Confirmation after a config message is processed.
{
"type": "config_updated",
"config": { "interval_seconds": 10, "enable_video": true, "..." : "..." }
}
error
Sent when something goes wrong.
{
"type": "error",
"code": "WS_10005",
"message": "Invalid WebSocket message format."
}
Code Example
import WebSocket from "ws";
const ws = new WebSocket(
"wss://api.tuteliq.ai/api/v1/safety/voice/stream?api_key=YOUR_API_KEY"
);
ws.on("open", () => {
// Configure for voice + video monitoring
ws.send(JSON.stringify({
type: "config",
interval_seconds: 10,
analysis_types: ["bullying", "unsafe", "grooming"],
enable_video: true,
frame_interval_seconds: 5,
context: { age_group: "11-13" },
}));
// Stream audio from a source (e.g., microphone, file, or RTC track)
const audioStream = getAudioStream();
audioStream.on("data", (chunk) => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(chunk); // Binary audio (no prefix needed)
}
});
// Stream video frames with 0x01 prefix
const videoCapture = getVideoCapture();
videoCapture.on("frame", (jpegBuffer) => {
if (ws.readyState === WebSocket.OPEN) {
const tagged = Buffer.concat([Buffer.from([0x01]), jpegBuffer]);
ws.send(tagged);
}
});
});
ws.on("message", (data) => {
const event = JSON.parse(data.toString());
switch (event.type) {
case "ready":
console.log(`Session started: ${event.session_id}`);
break;
case "transcription":
console.log(`Transcript: ${event.text}`);
break;
case "alert":
console.log(`[${event.severity.toUpperCase()}] ${event.category}`);
if (event.severity === "critical") {
// Trigger immediate moderation
}
break;
case "frame_analysis":
if (event.risk_score > 0.7) {
console.log(`Unsafe frame at index ${event.frame_index}`);
}
break;
case "session_summary":
console.log(`Session ended. Risk: ${event.overall_risk}, Flushes: ${event.total_flushes}, Frames: ${event.video_frames_analyzed}`);
break;
}
});
// End the session gracefully
function endSession() {
ws.send(JSON.stringify({ type: "end" }));
}
Credits
| Action | Credits |
|---|
| Audio flush (transcription + analysis) | 1 per flush |
| Video frame analysis | 3 per frame |
Credits are deducted as each flush or frame is processed. Use the interval_seconds and frame_interval_seconds settings to control how frequently credits are consumed.
Session Limits
| Limit | Value |
|---|
| Max session duration | 1 hour |
| Audio flush interval | 5–30 seconds (default: 10s) |
| Video frame interval | min 3 seconds (default: 5s) |
| Max audio buffer per flush | 10MB |
| Max video frame size | 5MB |
| Grooming context window | Last 50 messages |
| Max transcript length | 100,000 characters |
Close Codes
| Code | Meaning |
|---|
| 1000 | Normal closure |
| 4001 | Authentication failed |
| 4003 | Subscription limit exceeded / message limit reached |
| 4029 | Connection limit exceeded for your tier |
Voice streaming sessions have a maximum duration of 1 hour. The server will send a session_summary event and close the connection when the limit is reached. Heartbeat pings are sent every 30 seconds — connections that miss a pong will be terminated as stale.