Skip to main content
Tuteliq can analyze uploaded video files by extracting key frames at even intervals and running vision analysis on each frame. This is ideal for moderating user-uploaded videos, screen recordings, and video messages.

Endpoint

POST /api/v1/safety/video
Content-Type: multipart/form-data

How It Works

1

Upload

Send a video file as multipart/form-data. Supported formats: MP4, WebM, QuickTime, AVI. Max file size: 100MB. Max duration: 10 minutes.
2

Frame Extraction

The server uses ffmpeg to extract key frames at even intervals. You can control the number of frames with max_frames (default: 10, max: 20).
3

Vision Analysis

Each extracted frame is analyzed by the vision model for visual safety concerns — violence, sexual content, self-harm imagery, substance use, and more. Frames are processed in parallel batches of 3 for speed.
4

Aggregation

Results are aggregated into an overall_risk_score (the max across all frames) and overall_severity. Frames above the risk threshold are returned as flagged_timestamps with specific reasons.

Request

FieldTypeRequiredDescription
fileFileYesVideo file (mp4, webm, quicktime, avi). Max 100MB.
analysis_typestringNounsafe or all (default: all)
max_framesnumberNoMaximum frames to extract (default: 10, max: 20)
file_idstringNoYour file reference, echoed in response
external_idstringNoYour correlation ID, echoed in response
customer_idstringNoMulti-tenant customer ID, echoed in response
age_groupstringNoAge group for calibrated scoring (e.g., "11-13")
platformstringNoPlatform name (e.g., "TikTok", "Discord")
metadataJSON stringNoArbitrary key-value metadata (max 20 properties)

Response

{
  "file_id": "vid_abc123",
  "frames_analyzed": 10,
  "duration_seconds": 30.5,
  "frame_results": [
    {
      "frame_index": 0,
      "timestamp_s": 0,
      "vision": {
        "extracted_text": "",
        "visual_categories": [],
        "visual_severity": "none",
        "visual_confidence": 0.92,
        "visual_description": "Normal video frame showing a chat interface",
        "contains_text": true,
        "contains_faces": false
      },
      "risk_score": 0,
      "severity": "none"
    },
    {
      "frame_index": 5,
      "timestamp_s": 15.3,
      "vision": {
        "extracted_text": "",
        "visual_categories": ["violence"],
        "visual_severity": "high",
        "visual_confidence": 0.88,
        "visual_description": "Video frame depicting violent content",
        "contains_text": false,
        "contains_faces": true
      },
      "risk_score": 0.8,
      "severity": "high"
    }
  ],
  "overall_risk_score": 0.8,
  "overall_severity": "high",
  "flagged_timestamps": [
    {
      "timestamp_s": 15.3,
      "reason": "violence",
      "severity": "high"
    }
  ],
  "credits_used": 10
}

Response Fields

FieldTypeDescription
frames_analyzednumberNumber of frames extracted and analyzed
duration_secondsnumberVideo duration in seconds
frame_resultsarrayPer-frame analysis results (see below)
overall_risk_scorenumberMaximum risk score across all frames (0.0–1.0)
overall_severitystringnone, low, medium, high, or critical
flagged_timestampsarrayFrames with risk score above the low threshold
credits_usednumberAlways 10

Per-Frame Result

FieldTypeDescription
frame_indexnumber0-based index of the frame
timestamp_snumberTimestamp in the video (seconds)
vision.visual_categoriesstring[]Detected harm categories
vision.visual_severitystringFrame-level severity
vision.visual_confidencenumberConfidence of the visual classification
vision.visual_descriptionstringHuman-readable description of the frame
vision.extracted_textstringAny text found in the frame via OCR
vision.contains_textbooleanWhether the frame contains text
vision.contains_facesbooleanWhether the frame contains faces
risk_scorenumberNumeric risk score for the frame (0.0–1.0)
severitystringSeverity level for the frame

Code Examples

curl -X POST https://api.tuteliq.ai/api/v1/safety/video \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@screen_recording.mp4" \
  -F "analysis_type=unsafe" \
  -F "max_frames=15" \
  -F "age_group=11-13" \
  -F "platform=Discord"

Severity Thresholds

The overall_severity is derived from the highest risk_score across all frames:
Risk ScoreSeverity
< 0.3none
0.3 – 0.5low
0.5 – 0.7medium
0.7 – 0.9high
>= 0.9critical

Credits

Video analysis costs 10 credits per request, regardless of the number of frames extracted. For real-time video monitoring with per-frame billing, see Voice & Video Streaming.

Webhooks

When a video is flagged with severity medium or above, Tuteliq automatically triggers any configured webhooks with the video analysis type. The webhook payload includes the flagged timestamps, overall risk score, and detected categories.
Video analysis requires ffmpeg to be installed on the server for frame extraction. The official Docker image and Cloud Run deployment include ffmpeg by default.