Skip to main content
Tuteliq provides a WebSocket-based streaming endpoint that transcribes audio in real time, analyzes video frames, and emits safety alerts as they are detected. This allows you to moderate voice chat, video calls, and other live media without waiting for the full recording to finish.

Endpoint

wss://api.tuteliq.ai/api/v1/safety/voice/stream?api_key=YOUR_API_KEY
Authentication is handled via the api_key query parameter or the Authorization: Bearer YOUR_API_KEY header. The connection will be rejected with a 4001 close code if the key is invalid or expired.

Connection Limits

Concurrent WebSocket connections are limited by your plan tier:
PlanMax Connections
Starter (Free)1
Indie3
Pro10
Business50
EnterpriseUnlimited
Exceeding your limit returns close code 4029.

Binary Protocol

Send media data as binary WebSocket frames. A prefix byte discriminates between audio and video:
Prefix ByteMeaning
(none)Audio chunk (backward compatible)
0x00Audio chunk (explicit tag)
0x01Video frame (JPEG/PNG)

Audio Format

The recommended audio format is:
ParameterValue
EncodingPCM 16-bit LE
Sample Rate16 kHz
Channels1 (mono)
Chunk Size4096–32768 bytes
Other sample rates (8 kHz, 44.1 kHz, 48 kHz) are accepted but will be resampled server-side, which adds latency. 16 kHz mono gives the best balance of accuracy and speed.

Video Frames

Send JPEG or PNG frames with a 0x01 prefix byte. Frames are analyzed at the configured frame_interval_seconds (default 5s, minimum 3s). Each frame must be under 5MB.
// Tag a JPEG frame with the 0x01 prefix
const tagged = Buffer.concat([Buffer.from([0x01]), jpegBuffer]);
ws.send(tagged);

Connection Lifecycle

1

Connect

Open a WebSocket connection to the streaming endpoint with your API key.
2

Receive Ready

The server sends a ready event with your session_id and default config.
3

Configure (optional)

Send a JSON config message to adjust analysis settings.
4

Stream Media

Send binary audio chunks (and optionally video frames) continuously. The server transcribes audio and analyzes video at configured intervals.
5

Receive Events

The server sends JSON text frames containing transcriptions, safety alerts, and video frame analysis results.
6

End Session

Send a {"type": "end"} message or close the connection. The server flushes remaining data and sends a session_summary event.

Client Messages (JSON)

Config

Update session settings at any time. The server responds with a config_updated event.
{
  "type": "config",
  "interval_seconds": 10,
  "analysis_types": ["bullying", "unsafe", "grooming", "emotions"],
  "enable_video": true,
  "frame_interval_seconds": 5,
  "context": {
    "age_group": "11-13",
    "platform": "Discord",
    "language": "en"
  }
}
FieldTypeDefaultDescription
interval_secondsnumber10Audio flush interval in seconds (5–30).
analysis_typesstring[]allSafety categories to analyze: bullying, unsafe, grooming, emotions.
enable_videobooleanfalseEnable video frame analysis.
frame_interval_secondsnumber5Minimum interval between video frame analyses (min 3s).
contextobjectAnalysis context: age_group, platform, language, child_age.

End

Gracefully close the session. The server flushes remaining data and sends a summary.
{ "type": "end" }

Server Events

ready

Sent immediately after authentication succeeds.
{
  "type": "ready",
  "session_id": "abc123",
  "config": {
    "interval_seconds": 10,
    "analysis_types": ["bullying", "unsafe", "grooming", "emotions"],
    "enable_video": false,
    "frame_interval_seconds": 5
  }
}

transcription

Emitted after each audio flush with transcribed text and timestamped segments.
{
  "type": "transcription",
  "text": "hey do you want to come over to my place after school",
  "segments": [
    { "start": 0.0, "end": 2.5, "text": "hey do you want to" },
    { "start": 2.5, "end": 5.1, "text": "come over to my place after school" }
  ],
  "flush_index": 3
}

alert

Safety concern detected in the latest audio flush.
{
  "type": "alert",
  "category": "grooming",
  "severity": "high",
  "risk_score": 0.87,
  "details": {
    "is_grooming": true,
    "flags": ["secrecy_request", "private_meeting"],
    "confidence": 0.91,
    "rationale": "Potential grooming pattern: solicitation of a private meeting with secrecy directive.",
    "recommended_action": "immediate_intervention"
  },
  "flush_index": 3
}
Categories: bullying, unsafe, grooming, emotions, visual.

frame_analysis

Emitted for each analyzed video frame when enable_video is true.
{
  "type": "frame_analysis",
  "session_id": "abc123",
  "frame_index": 5,
  "timestamp": "2026-02-17T10:30:15.000Z",
  "vision": {
    "extracted_text": "",
    "visual_categories": ["violence"],
    "visual_severity": "high",
    "visual_confidence": 0.88,
    "visual_description": "Frame depicts violent content",
    "contains_text": false,
    "contains_faces": true
  },
  "risk_score": 0.8,
  "severity": "high"
}

session_summary

Sent when the session ends (via end message or connection close).
{
  "type": "session_summary",
  "session_id": "abc123",
  "duration_seconds": 120,
  "overall_risk": "medium",
  "overall_risk_score": 0.55,
  "total_flushes": 12,
  "transcript": "Full concatenated transcript of the session...",
  "video_frames_analyzed": 8
}

config_updated

Confirmation after a config message is processed.
{
  "type": "config_updated",
  "config": { "interval_seconds": 10, "enable_video": true, "..." : "..." }
}

error

Sent when something goes wrong.
{
  "type": "error",
  "code": "WS_10005",
  "message": "Invalid WebSocket message format."
}

Code Example

import WebSocket from "ws";

const ws = new WebSocket(
  "wss://api.tuteliq.ai/api/v1/safety/voice/stream?api_key=YOUR_API_KEY"
);

ws.on("open", () => {
  // Configure for voice + video monitoring
  ws.send(JSON.stringify({
    type: "config",
    interval_seconds: 10,
    analysis_types: ["bullying", "unsafe", "grooming"],
    enable_video: true,
    frame_interval_seconds: 5,
    context: { age_group: "11-13" },
  }));

  // Stream audio from a source (e.g., microphone, file, or RTC track)
  const audioStream = getAudioStream();
  audioStream.on("data", (chunk) => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(chunk); // Binary audio (no prefix needed)
    }
  });

  // Stream video frames with 0x01 prefix
  const videoCapture = getVideoCapture();
  videoCapture.on("frame", (jpegBuffer) => {
    if (ws.readyState === WebSocket.OPEN) {
      const tagged = Buffer.concat([Buffer.from([0x01]), jpegBuffer]);
      ws.send(tagged);
    }
  });
});

ws.on("message", (data) => {
  const event = JSON.parse(data.toString());

  switch (event.type) {
    case "ready":
      console.log(`Session started: ${event.session_id}`);
      break;

    case "transcription":
      console.log(`Transcript: ${event.text}`);
      break;

    case "alert":
      console.log(`[${event.severity.toUpperCase()}] ${event.category}`);
      if (event.severity === "critical") {
        // Trigger immediate moderation
      }
      break;

    case "frame_analysis":
      if (event.risk_score > 0.7) {
        console.log(`Unsafe frame at index ${event.frame_index}`);
      }
      break;

    case "session_summary":
      console.log(`Session ended. Risk: ${event.overall_risk}, Flushes: ${event.total_flushes}, Frames: ${event.video_frames_analyzed}`);
      break;
  }
});

// End the session gracefully
function endSession() {
  ws.send(JSON.stringify({ type: "end" }));
}

Credits

ActionCredits
Audio flush (transcription + analysis)1 per flush
Video frame analysis3 per frame
Credits are deducted as each flush or frame is processed. Use the interval_seconds and frame_interval_seconds settings to control how frequently credits are consumed.

Session Limits

LimitValue
Max session duration1 hour
Audio flush interval5–30 seconds (default: 10s)
Video frame intervalmin 3 seconds (default: 5s)
Max audio buffer per flush10MB
Max video frame size5MB
Grooming context windowLast 50 messages
Max transcript length100,000 characters

Close Codes

CodeMeaning
1000Normal closure
4001Authentication failed
4003Subscription limit exceeded / message limit reached
4029Connection limit exceeded for your tier
Voice streaming sessions have a maximum duration of 1 hour. The server will send a session_summary event and close the connection when the limit is reached. Heartbeat pings are sent every 30 seconds — connections that miss a pong will be terminated as stale.