Tuteliq provides a WebSocket-based voice streaming endpoint that transcribes audio in real time and emits safety alerts as they are detected. This allows you to moderate voice chat, calls, and other live audio without waiting for the full recording to finish.
Endpoint
wss://api.tuteliq.ai/safety/voice/stream?token=YOUR_API_KEY
Authentication is handled via the token query parameter. The connection will be rejected with a 4001 close code if the key is invalid or expired.
Send audio data as binary WebSocket frames. The recommended format is:
| Parameter | Value |
|---|
| Encoding | PCM 16-bit LE |
| Sample Rate | 16 kHz |
| Channels | 1 (mono) |
| Chunk Size | 4096–32768 bytes |
Other sample rates (8 kHz, 44.1 kHz, 48 kHz) are accepted but will be resampled server-side, which adds latency. 16 kHz mono gives the best balance of accuracy and speed.
Connection Lifecycle
Connect
Open a WebSocket connection to the streaming endpoint with your API key.
Configure (optional)
Send a JSON text frame to adjust settings before streaming audio.
Send Audio
Stream binary audio chunks continuously. The server begins transcription and analysis immediately.
Receive Alerts
The server sends JSON text frames containing partial transcriptions and safety alerts as they are detected.
Close
Close the connection normally. The server will flush any remaining audio and send a final summary frame.
Configuration
After connecting, you can send a JSON text frame to configure the session:
{
"type": "config",
"flush_interval_ms": 2000,
"categories": ["grooming", "bullying", "self_harm", "substance", "sexual_content"],
"language": "en",
"min_severity": "medium"
}
| Field | Type | Default | Description |
|---|
flush_interval_ms | number | 3000 | How often (in ms) the server emits transcription results. Lower values give faster feedback but may be less accurate. |
categories | string[] | all | Safety categories to monitor. Omit to enable all. |
language | string | "en" | Language hint for transcription. |
min_severity | string | "low" | Minimum severity level to trigger alerts (low, medium, high, critical). |
Server Messages
Transcription Frame
{
"type": "transcription",
"text": "hey do you want to come over to my place after school",
"is_partial": false,
"timestamp_ms": 14200
}
Safety Alert Frame
{
"type": "alert",
"category": "grooming",
"severity": "high",
"risk_score": 0.87,
"text": "hey do you want to come over to my place after school",
"description": "Potential grooming pattern detected: private meeting solicitation directed at a minor.",
"timestamp_ms": 14200
}
Session Summary Frame
Sent when the connection closes:
{
"type": "summary",
"duration_ms": 62000,
"alerts_count": 2,
"highest_severity": "high",
"categories_flagged": ["grooming"],
"transcript_length": 347
}
Code Example
import WebSocket from "ws";
const ws = new WebSocket(
"wss://api.tuteliq.ai/safety/voice/stream?token=YOUR_API_KEY"
);
ws.on("open", () => {
// Optional: configure the session
ws.send(JSON.stringify({
type: "config",
flush_interval_ms: 2000,
categories: ["grooming", "bullying", "self_harm"],
min_severity: "medium",
}));
// Stream audio from a source (e.g., microphone, file, or RTC track)
const audioStream = getAudioStream(); // your PCM 16-bit 16kHz mono source
audioStream.on("data", (chunk) => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(chunk);
}
});
audioStream.on("end", () => {
ws.close(1000, "stream_complete");
});
});
ws.on("message", (data) => {
const message = JSON.parse(data.toString());
if (message.type === "alert") {
console.log(
`[${message.severity.toUpperCase()}] ${message.category}: ${message.description}`
);
// Trigger your moderation workflow here
}
if (message.type === "transcription" && !message.is_partial) {
console.log(`Transcript: ${message.text}`);
}
if (message.type === "summary") {
console.log(`Session ended. Alerts: ${message.alerts_count}`);
}
});
ws.on("close", (code, reason) => {
console.log(`Connection closed: ${code} ${reason}`);
});
ws.on("error", (err) => {
console.error("WebSocket error:", err.message);
});
Close Codes
| Code | Meaning |
|---|
| 1000 | Normal closure |
| 4001 | Authentication failed |
| 4002 | Rate limit exceeded |
| 4003 | Invalid audio format |
| 4008 | Session duration limit reached |
| 4500 | Internal server error |
Voice streaming sessions have a maximum duration of 10 minutes on the free tier and 60 minutes on paid tiers. The server will send a summary frame and close the connection with code 4008 when the limit is reached.