Endpoint
How It Works
Upload
Send a video file as
multipart/form-data. Supported formats: MP4, WebM, QuickTime, AVI. Max file size: 100MB. Max duration: 10 minutes.Frame Extraction
The server uses ffmpeg to extract key frames at even intervals. You can control the number of frames with
max_frames (default: 10, max: 20).Vision Analysis
Each extracted frame is analyzed by the vision model for visual safety concerns — violence, sexual content, self-harm imagery, substance use, and more. Frames are processed in parallel batches of 3 for speed.
Request
| Field | Type | Required | Description |
|---|---|---|---|
file | File | Yes | Video file (mp4, webm, quicktime, avi). Max 100MB. |
analysis_type | string | No | unsafe or all (default: all) |
max_frames | number | No | Maximum frames to extract (default: 10, max: 20) |
file_id | string | No | Your file reference, echoed in response |
external_id | string | No | Your correlation ID, echoed in response |
customer_id | string | No | Multi-tenant customer ID, echoed in response |
age_group | string | No | Age group for calibrated scoring (e.g., "11-13") |
platform | string | No | Platform name (e.g., "TikTok", "Discord") |
metadata | JSON string | No | Arbitrary key-value metadata (max 20 properties) |
Response
Response Fields
| Field | Type | Description |
|---|---|---|
frames_analyzed | number | Number of frames extracted and analyzed |
duration_seconds | number | Video duration in seconds |
frame_results | array | Per-frame analysis results (see below) |
overall_risk_score | number | Maximum risk score across all frames (0.0–1.0) |
overall_severity | string | none, low, medium, high, or critical |
flagged_timestamps | array | Frames with risk score above the low threshold |
credits_used | number | Always 10 |
Per-Frame Result
| Field | Type | Description |
|---|---|---|
frame_index | number | 0-based index of the frame |
timestamp_s | number | Timestamp in the video (seconds) |
vision.visual_categories | string[] | Detected harm categories |
vision.visual_severity | string | Frame-level severity |
vision.visual_confidence | number | Confidence of the visual classification |
vision.visual_description | string | Human-readable description of the frame |
vision.extracted_text | string | Any text found in the frame via OCR |
vision.contains_text | boolean | Whether the frame contains text |
vision.contains_faces | boolean | Whether the frame contains faces |
risk_score | number | Numeric risk score for the frame (0.0–1.0) |
severity | string | Severity level for the frame |
Code Examples
Severity Thresholds
Theoverall_severity is derived from the highest risk_score across all frames:
| Risk Score | Severity |
|---|---|
| < 0.3 | none |
| 0.3 – 0.5 | low |
| 0.5 – 0.7 | medium |
| 0.7 – 0.9 | high |
| >= 0.9 | critical |
Credits
Video analysis costs 10 credits per request, regardless of the number of frames extracted. For real-time video monitoring with per-frame billing, see Voice & Video Streaming.Webhooks
When a video is flagged with severitymedium or above, Tuteliq automatically triggers any configured webhooks with the video analysis type. The webhook payload includes the flagged timestamps, overall risk score, and detected categories.
Video analysis requires ffmpeg to be installed on the server for frame extraction. The official Docker image and Cloud Run deployment include ffmpeg by default.