confirmed_synthetic, suspected_synthetic, unknown, or confirmed_authentic.
For images and video, Tuteliq goes far beyond a single model call. The multi-signal forensic pipeline runs up to 6 independent analysis engines in parallel — vision forensics, EXIF metadata, pixel statistics, C2PA Content Credentials, watermark detection, and perceptual hashing — then aggregates them into a weighted ensemble assessment. Any single engine can fail without degrading the result.
6-Signal Image Pipeline
Vision AI, EXIF metadata, pixel statistics, C2PA Content Credentials, frequency-domain watermark analysis, and perceptual hash matching — all in parallel.
Temporal + Lip-Sync Video
Frame-by-frame face identity tracking, landmark stability analysis, and audio-visual lip-sync correlation to catch deepfakes that single-frame analysis misses.
Spectral Audio Forensics
Mel spectrogram analysis via vision AI plus quantitative audio statistics — dynamic range, silence ratio, flat factor — to detect synthetic speech beyond transcript analysis.
Classification Levels
| Classification | Description |
|---|---|
confirmed_synthetic | High confidence AI-generated content detected |
suspected_synthetic | Moderate indicators of synthetic content |
unknown | Insufficient data to determine authenticity |
confirmed_authentic | High confidence genuine, human-created content |
Category Taxonomy
| Tag | Description |
|---|---|
AI_GENERATED_TEXT | LLM-generated text (ChatGPT, Claude, etc.) |
AI_GENERATED_IMAGE | AI-generated image (Midjourney, DALL-E, Stable Diffusion) |
AI_GENERATED_AUDIO | Voice cloning, text-to-speech synthesis |
AI_GENERATED_VIDEO | Fully AI-generated video content |
AI_MANIPULATED_MEDIA | Deepfakes, face swaps, manipulated media |
SYNTHETIC_IDENTITY | Fake identity created using AI tools |
SYNTHETIC_CSAM | AI-generated child sexual abuse material |
SYNTHETIC_IMPERSONATION | AI-generated impersonation of a real person |
AI_ENHANCED_GROOMING | AI-assisted grooming scripts |
AI_ENHANCED_SEXTORTION | AI-assisted sextortion content |
Text Detection
Analyzes text for AI-generated content indicators — LLM-generated text, synthetic identities, AI-enhanced grooming scripts, and more.Request
Response
Image Detection
Analyzes uploaded images for AI-generation artifacts using a 6-signal forensic pipeline — running vision analysis, EXIF metadata extraction, pixel statistics, C2PA Content Credentials, watermark detection, and perceptual hashing in parallel.How It Works
Upload
Send an image as
multipart/form-data. Supported formats: JPEG, PNG, WebP, GIF. Max file size: 10MB.Multi-Signal Forensic Analysis
Six independent analysis engines run in parallel, each fault-isolated so failures in one don’t affect others:
- Vision AI — Forensic prompt inspects pixel-level artifacts (face consistency, skin texture, hand anomalies, background coherence, lighting mismatches)
- EXIF Metadata — Checks for AI generator signatures in EXIF tags, XMP data, and PNG text chunks. Flags suspicious absence of camera metadata.
- Pixel Statistics — Shannon entropy, edge density (Laplacian convolution), and channel uniformity analysis
- C2PA Content Credentials — Detects and validates C2PA manifests from DALL-E, Adobe Firefly, Google Imagen, and other tools
- Watermark Detection — Frequency-domain analysis for invisible watermarks (SynthID, Stable Diffusion DWT/DCT)
- Perceptual Hashing — DCT-based pHash compared against a database of known synthetic content via Hamming distance
Signal Aggregation
All signals are combined into a weighted ensemble: vision (30%), metadata (15%), pixel statistics (15%), C2PA provenance (15%), watermarks (10%), perceptual hash (15%). The aggregated forensic summary is fed to the classifier.
Classification with Overrides
The classifier produces a final verdict. Two signals can override the classifier with definitive results:
- C2PA declares AI generation → forced to
confirmed_syntheticwith confidence ≥ 0.95 - Perceptual hash matches known synthetic → forced to
confirmed_syntheticwith confidence ≥ 0.90
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
file | File | Yes | Image file (JPEG, PNG, WebP, GIF). Max 10MB. |
age_group | string | No | Age group context (e.g., "13-15") |
language | string | No | ISO language code |
platform | string | No | Platform context |
external_id | string | No | Your internal reference ID |
customer_id | string | No | Your customer/user ID |
Response
Forensic Signal Sources
| Signal | Weight | What It Checks |
|---|---|---|
| Vision AI | 30% | Face consistency, skin texture, hand anomalies, background coherence, lighting, hair rendering, text/symbols, resolution, AI model signatures |
| EXIF Metadata | 15% | Camera model, GPS, AI generator names in EXIF/XMP, PNG tEXt chunks (Stable Diffusion parameters), suspicious absence of camera data |
| Pixel Statistics | 15% | Shannon entropy, Laplacian edge density, channel uniformity — GAN images have distinctive statistical signatures |
| C2PA Provenance | 15% | Content Credentials manifests from DALL-E, Firefly, Imagen. Definitive when present — overrides classifier. |
| Watermark | 10% | High-frequency energy analysis, periodic pattern detection at known watermark frequencies, LSB distribution, corner entropy |
| Perceptual Hash | 15% | DCT-based 64-bit pHash compared against known-synthetic database. Hamming distance ≤ 10 = match. Definitive when matched. |
New Response Fields
| Field | Type | Description |
|---|---|---|
metadata_analysis | object | EXIF/metadata extraction results — format, dimensions, camera presence, GPS, AI generator detection |
provenance | object | C2PA Content Credentials — present only when a C2PA manifest is found |
forensic_signals | object | Multi-signal ensemble summary — signal counts per source and combined confidence boost |
perceptual_hash | string | 64-bit DCT-based perceptual hash of the image |
known_synthetic_match | object | Present only when the perceptual hash matches a known synthetic image — includes distance and category |
Audio Detection
Analyzes uploaded audio using dual-signal forensics: transcript-based text analysis plus spectral analysis — mel spectrogram via vision AI and quantitative audio statistics for synthetic speech indicators.How It Works
Upload
Send an audio file as
multipart/form-data. Supported formats: MP3, WAV, M4A, OGG, FLAC, WebM. Max file size: 25MB.Parallel Analysis
Two analysis tracks run simultaneously:
- Transcription — Audio is transcribed using Whisper (EU-hosted, GDPR-compliant)
- Spectral Analysis — FFmpeg generates a mel spectrogram image and extracts audio statistics (RMS, dynamic range, silence ratio, flat factor, DC offset)
Spectrogram Vision Analysis
The mel spectrogram is analyzed by a dedicated forensic prompt checking for:
- Frequency band uniformity (TTS hallmark)
- Harmonic structure anomalies
- Missing breath noise and background ambience
- Onset/offset patterns, formant transitions
- Pitch contour regularity, aliasing artifacts
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
file | File | Yes | Audio file (MP3, WAV, M4A, OGG, FLAC, WebM). Max 25MB. |
age_group | string | No | Age group context |
language | string | No | ISO language code |
platform | string | No | Platform context |
external_id | string | No | Your internal reference ID |
customer_id | string | No | Your customer/user ID |
Response
New Response Fields
| Field | Type | Description |
|---|---|---|
audio_stats | object | Quantitative audio statistics — RMS mean/peak, dynamic range, silence ratio, flat factor, DC offset |
spectral_signals | array | Human-readable spectral analysis indicators (e.g., low dynamic range, frequency uniformity) |
If the audio contains no intelligible speech but spectral analysis detects synthetic patterns, the endpoint returns
classification: "suspected_synthetic" with spectral signals. If neither speech nor spectral anomalies are found, it returns classification: "unknown".Video Detection
Analyzes uploaded video with a multi-layer forensic pipeline — per-frame vision analysis, temporal face consistency tracking, audio-visual lip-sync correlation, spectral audio forensics, and transcription.How It Works
Upload
Send a video file as
multipart/form-data. Supported formats: MP4, WebM, QuickTime, AVI. Max file size: 100MB.Frame + Audio Extraction
Frames are extracted at even intervals using FFmpeg (default: 6 frames, max: 20). Audio is extracted as a separate track in parallel.
Parallel Multi-Signal Analysis
Five analysis tracks run simultaneously via fault-isolated
Promise.allSettled:- Per-Frame Vision — Each frame analyzed for AI artifacts (face consistency, skin texture, background coherence) in concurrent batches of 3
- Temporal Consistency — face-api.js detects faces across all frames, computes Euclidean distance between face descriptors (real < 0.4, deepfake > 0.6), and measures landmark stability via eye-to-nose ratio variance
- Lip-Sync Correlation — Mouth openness (from 68-point face landmarks) correlated against frame-aligned audio energy (via FFmpeg). Pearson correlation > 0.5 = real speech, < 0.3 = lip-sync deepfake
- Spectral Audio Analysis — Mel spectrogram + audio statistics (same as audio endpoint)
- Transcription — Whisper transcription of the audio track
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
file | File | Yes | Video file (MP4, WebM, QuickTime, AVI). Max 100MB. |
max_frames | number | No | Frames to extract (default: 6, max: 20) |
age_group | string | No | Age group context |
language | string | No | ISO language code |
platform | string | No | Platform context |
external_id | string | No | Your internal reference ID |
customer_id | string | No | Your customer/user ID |
Response
New Response Fields
| Field | Type | Description |
|---|---|---|
temporal_consistency | object | Face identity tracking across frames — consistency scores, landmark stability, anomalous frame pairs |
lip_sync | object | Audio-visual lip-sync correlation — Pearson coefficient, silent mouth movement detection, voice-without-movement detection |
audio_stats | object | Quantitative audio statistics from spectral analysis |
spectral_signals | array | Spectral analysis indicators (when detected) |
Temporal Consistency Signals
| Signal | Meaning |
|---|---|
identity_drift | Face descriptors differ significantly between consecutive frames — classic deepfake artifact |
low_identity_consistency | Average face identity distance is high across all frames |
unstable_landmarks | Facial geometry (eye-to-nose ratio) varies abnormally between frames |
consistent_face | Face identity and geometry are stable — reduces synthetic confidence |
Lip-Sync Signals
| Signal | Meaning |
|---|---|
poor_lip_sync | Mouth movement and audio energy have low correlation (< 0.3) |
good_lip_sync | Strong correlation (> 0.5) between mouth and audio — authentic indicator |
silent_mouth_movement | Mouth opens in > 30% of frames without corresponding audio |
voice_without_movement | Audio present in > 30% of frames without mouth movement |
If the video has no audio track, the
transcription, lip_sync, audio_stats, and spectral_signals fields are omitted and no transcription credits are charged.Account-Level Synthetic Profiling
Track synthetic content patterns at the account level with a 30-day rolling window. Whencustomer_id is provided on any detection request, Tuteliq automatically builds a synthetic profile for that account.
Response
Response Fields
| Field | Type | Description |
|---|---|---|
total_items | integer | Total items analyzed in the 30-day window |
synthetic_count | integer | Items classified as confirmed_synthetic or suspected_synthetic |
authentic_count | integer | Items classified as confirmed_authentic |
unknown_count | integer | Items classified as unknown |
avg_confidence | float | Average confidence across all classifications |
category_distribution | object | Count of items per synthetic category |
account_synthetic_score | float | 0.0-1.0 composite score — weighted by ratio and confidence |
trend | string | increasing, stable, decreasing, or unknown — based on first-half vs. second-half synthetic rate |
Account profiling is automatic and zero-cost. Pass
customer_id on any detection request to start building the profile. The profile endpoint itself does not consume credits.Credit Summary
| Endpoint | Path | Credits |
|---|---|---|
| Text | /safety/synthetic-content | 2 |
| Image | /safety/synthetic-content/image | 5 |
| Audio | /safety/synthetic-content/audio | 4-7 |
| Video | /safety/synthetic-content/video | 2 + 3/frame + 2 (audio) |
| Profile | /safety/synthetic-content/profile/:id | 0 |
Multi-Endpoint Support
The text-based synthetic content detector is available in the Multi-Endpoint fan-out. Include"synthetic-content" in your endpoint list:
Image, audio, and video synthetic detection are multipart-only and not available through the multi-endpoint batch.