> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tuteliq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice & Video Streaming

> Real-time voice and video moderation via WebSocket

Tuteliq provides a WebSocket-based streaming endpoint that transcribes audio in real time, analyzes video frames, and emits safety alerts as they are detected. This allows you to moderate voice chat, video calls, and other live media without waiting for the full recording to finish.

## Endpoint

```
wss://api.tuteliq.ai/api/v1/safety/voice/stream?api_key=YOUR_API_KEY
```

Authentication is handled via the `api_key` query parameter or the `Authorization: Bearer YOUR_API_KEY` header. The connection will be rejected with a `4001` close code if the key is invalid or expired.

### Connection Limits

Concurrent WebSocket connections are limited by your plan tier:

| Plan           | Max Connections |
| -------------- | --------------- |
| Starter (Free) | 1               |
| Indie          | 3               |
| Pro            | 10              |
| Business       | 50              |
| Enterprise     | Unlimited       |

Exceeding your limit returns close code `4029`.

## Binary Protocol

Send media data as **binary WebSocket frames**. A prefix byte discriminates between audio and video:

| Prefix Byte | Meaning                           |
| ----------- | --------------------------------- |
| *(none)*    | Audio chunk (backward compatible) |
| `0x00`      | Audio chunk (explicit tag)        |
| `0x01`      | Video frame (JPEG/PNG)            |

### Audio Format

The recommended audio format is:

| Parameter   | Value            |
| ----------- | ---------------- |
| Encoding    | PCM 16-bit LE    |
| Sample Rate | 16 kHz           |
| Channels    | 1 (mono)         |
| Chunk Size  | 4096–32768 bytes |

<Note>
  Other sample rates (8 kHz, 44.1 kHz, 48 kHz) are accepted but will be resampled server-side, which adds latency. 16 kHz mono gives the best balance of accuracy and speed.
</Note>

### Video Frames

Send JPEG or PNG frames with a `0x01` prefix byte. Frames are analyzed at the configured `frame_interval_seconds` (default 5s, minimum 3s). Each frame must be under 5MB.

```javascript theme={"dark"}
// Tag a JPEG frame with the 0x01 prefix
const tagged = Buffer.concat([Buffer.from([0x01]), jpegBuffer]);
ws.send(tagged);
```

## Connection Lifecycle

<Steps>
  <Step title="Connect">
    Open a WebSocket connection to the streaming endpoint with your API key.
  </Step>

  <Step title="Receive Ready">
    The server sends a `ready` event with your `session_id` and default config.
  </Step>

  <Step title="Configure (optional)">
    Send a JSON `config` message to adjust analysis settings.
  </Step>

  <Step title="Stream Media">
    Send binary audio chunks (and optionally video frames) continuously. The server transcribes audio and analyzes video at configured intervals.
  </Step>

  <Step title="Receive Events">
    The server sends JSON text frames containing transcriptions, safety alerts, and video frame analysis results.
  </Step>

  <Step title="End Session">
    Send a `{"type": "end"}` message or close the connection. The server flushes remaining data and sends a `session_summary` event.
  </Step>
</Steps>

## Client Messages (JSON)

### Config

Update session settings at any time. The server responds with a `config_updated` event.

```json theme={"dark"}
{
  "type": "config",
  "interval_seconds": 10,
  "analysis_types": ["bullying", "unsafe", "grooming", "emotions"],
  "enable_video": true,
  "frame_interval_seconds": 5,
  "context": {
    "age_group": "11-13",
    "platform": "Discord",
    "language": "en"
  }
}
```

| Field                    | Type      | Default | Description                                                                 |
| ------------------------ | --------- | ------- | --------------------------------------------------------------------------- |
| `interval_seconds`       | number    | 10      | Audio flush interval in seconds (5–30).                                     |
| `analysis_types`         | string\[] | all     | Safety categories to analyze: `bullying`, `unsafe`, `grooming`, `emotions`. |
| `enable_video`           | boolean   | false   | Enable video frame analysis.                                                |
| `frame_interval_seconds` | number    | 5       | Minimum interval between video frame analyses (min 3s).                     |
| `context`                | object    | —       | Analysis context: `age_group`, `platform`, `language`, `child_age`.         |

### End

Gracefully close the session. The server flushes remaining data and sends a summary.

```json theme={"dark"}
{ "type": "end" }
```

## Server Events

### `ready`

Sent immediately after authentication succeeds.

```json theme={"dark"}
{
  "type": "ready",
  "session_id": "abc123",
  "config": {
    "interval_seconds": 10,
    "analysis_types": ["bullying", "unsafe", "grooming", "emotions"],
    "enable_video": false,
    "frame_interval_seconds": 5
  }
}
```

### `transcription`

Emitted after each audio flush with transcribed text and timestamped segments.

```json theme={"dark"}
{
  "type": "transcription",
  "text": "hey do you want to come over to my place after school",
  "segments": [
    { "start": 0.0, "end": 2.5, "text": "hey do you want to" },
    { "start": 2.5, "end": 5.1, "text": "come over to my place after school" }
  ],
  "flush_index": 3
}
```

### `alert`

Safety concern detected in the latest audio flush.

```json theme={"dark"}
{
  "type": "alert",
  "category": "grooming",
  "severity": "high",
  "risk_score": 0.87,
  "details": {
    "is_grooming": true,
    "flags": ["secrecy_request", "private_meeting"],
    "confidence": 0.91,
    "rationale": "Potential grooming pattern: solicitation of a private meeting with secrecy directive.",
    "recommended_action": "immediate_intervention"
  },
  "flush_index": 3
}
```

Categories: `bullying`, `unsafe`, `grooming`, `emotions`, `visual`.

### `frame_analysis`

Emitted for each analyzed video frame when `enable_video` is true.

```json theme={"dark"}
{
  "type": "frame_analysis",
  "session_id": "abc123",
  "frame_index": 5,
  "timestamp": "2026-02-17T10:30:15.000Z",
  "vision": {
    "extracted_text": "",
    "visual_categories": ["violence"],
    "visual_severity": "high",
    "visual_confidence": 0.88,
    "visual_description": "Frame depicts violent content",
    "contains_text": false,
    "contains_faces": true
  },
  "risk_score": 0.8,
  "severity": "high"
}
```

### `session_summary`

Sent when the session ends (via `end` message or connection close).

```json theme={"dark"}
{
  "type": "session_summary",
  "session_id": "abc123",
  "duration_seconds": 120,
  "overall_risk": "medium",
  "overall_risk_score": 0.55,
  "total_flushes": 12,
  "transcript": "Full concatenated transcript of the session...",
  "video_frames_analyzed": 8
}
```

### `config_updated`

Confirmation after a `config` message is processed.

```json theme={"dark"}
{
  "type": "config_updated",
  "config": { "interval_seconds": 10, "enable_video": true, "..." : "..." }
}
```

### `error`

Sent when something goes wrong.

```json theme={"dark"}
{
  "type": "error",
  "code": "WS_10005",
  "message": "Invalid WebSocket message format."
}
```

## Code Example

<CodeGroup>
  ```javascript Node.js theme={"dark"}
  import WebSocket from "ws";

  const ws = new WebSocket(
    "wss://api.tuteliq.ai/api/v1/safety/voice/stream?api_key=YOUR_API_KEY"
  );

  ws.on("open", () => {
    // Configure for voice + video monitoring
    ws.send(JSON.stringify({
      type: "config",
      interval_seconds: 10,
      analysis_types: ["bullying", "unsafe", "grooming"],
      enable_video: true,
      frame_interval_seconds: 5,
      context: { age_group: "11-13" },
    }));

    // Stream audio from a source (e.g., microphone, file, or RTC track)
    const audioStream = getAudioStream();
    audioStream.on("data", (chunk) => {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(chunk); // Binary audio (no prefix needed)
      }
    });

    // Stream video frames with 0x01 prefix
    const videoCapture = getVideoCapture();
    videoCapture.on("frame", (jpegBuffer) => {
      if (ws.readyState === WebSocket.OPEN) {
        const tagged = Buffer.concat([Buffer.from([0x01]), jpegBuffer]);
        ws.send(tagged);
      }
    });
  });

  ws.on("message", (data) => {
    const event = JSON.parse(data.toString());

    switch (event.type) {
      case "ready":
        console.log(`Session started: ${event.session_id}`);
        break;

      case "transcription":
        console.log(`Transcript: ${event.text}`);
        break;

      case "alert":
        console.log(`[${event.severity.toUpperCase()}] ${event.category}`);
        if (event.severity === "critical") {
          // Trigger immediate moderation
        }
        break;

      case "frame_analysis":
        if (event.risk_score > 0.7) {
          console.log(`Unsafe frame at index ${event.frame_index}`);
        }
        break;

      case "session_summary":
        console.log(`Session ended. Risk: ${event.overall_risk}, Flushes: ${event.total_flushes}, Frames: ${event.video_frames_analyzed}`);
        break;
    }
  });

  // End the session gracefully
  function endSession() {
    ws.send(JSON.stringify({ type: "end" }));
  }
  ```
</CodeGroup>

## Credits

| Action                                 | Credits     |
| -------------------------------------- | ----------- |
| Audio flush (transcription + analysis) | 7 per flush |
| Video frame analysis                   | 7 per frame |

Credits are deducted as each flush or frame is processed. Use the `interval_seconds` and `frame_interval_seconds` settings to control how frequently credits are consumed.

## Session Limits

| Limit                      | Value                       |
| -------------------------- | --------------------------- |
| Max session duration       | 1 hour                      |
| Audio flush interval       | 5–30 seconds (default: 10s) |
| Video frame interval       | min 3 seconds (default: 5s) |
| Max audio buffer per flush | 10MB                        |
| Max video frame size       | 5MB                         |
| Grooming context window    | Last 50 messages            |
| Max transcript length      | 100,000 characters          |

## Close Codes

| Code | Meaning                                             |
| ---- | --------------------------------------------------- |
| 1000 | Normal closure                                      |
| 4001 | Authentication failed                               |
| 4003 | Subscription limit exceeded / message limit reached |
| 4029 | Connection limit exceeded for your tier             |

<Warning>
  Voice streaming sessions have a maximum duration of **1 hour**. The server will send a `session_summary` event and close the connection when the limit is reached. Heartbeat pings are sent every 30 seconds — connections that miss a pong will be terminated as stale.
</Warning>
