Synthetic Content Detection

Tuteliq detects AI-generated and synthetic content across all four modalities — text, image, audio, and video. Each endpoint classifies content using the standardized taxonomy from the Child Protection Blueprint: confirmed_synthetic, suspected_synthetic, unknown, or confirmed_authentic. For images and video, Tuteliq goes far beyond a single model call. The multi-signal forensic pipeline runs up to 6 independent analysis engines in parallel — vision forensics, EXIF metadata, pixel statistics, C2PA Content Credentials, watermark detection, and perceptual hashing — then aggregates them into a weighted ensemble assessment. Any single engine can fail without degrading the result.

6-Signal Image Pipeline

Vision AI, EXIF metadata, pixel statistics, C2PA Content Credentials, frequency-domain watermark analysis, and perceptual hash matching — all in parallel.

Temporal + Lip-Sync Video

Frame-by-frame face identity tracking, landmark stability analysis, and audio-visual lip-sync correlation to catch deepfakes that single-frame analysis misses.

Spectral Audio Forensics

Mel spectrogram analysis via vision AI plus quantitative audio statistics — dynamic range, silence ratio, flat factor — to detect synthetic speech beyond transcript analysis.

Classification Levels

Classification	Description
`confirmed_synthetic`	High confidence AI-generated content detected
`suspected_synthetic`	Moderate indicators of synthetic content
`unknown`	Insufficient data to determine authenticity
`confirmed_authentic`	High confidence genuine, human-created content

Category Taxonomy

Tag	Description
`AI_GENERATED_TEXT`	LLM-generated text (ChatGPT, Claude, etc.)
`AI_GENERATED_IMAGE`	AI-generated image (Midjourney, DALL-E, Stable Diffusion)
`AI_GENERATED_AUDIO`	Voice cloning, text-to-speech synthesis
`AI_GENERATED_VIDEO`	Fully AI-generated video content
`AI_MANIPULATED_MEDIA`	Deepfakes, face swaps, manipulated media
`SYNTHETIC_IDENTITY`	Fake identity created using AI tools
`SYNTHETIC_CSAM`	AI-generated child sexual abuse material
`SYNTHETIC_IMPERSONATION`	AI-generated impersonation of a real person
`AI_ENHANCED_GROOMING`	AI-assisted grooming scripts
`AI_ENHANCED_SEXTORTION`	AI-assisted sextortion content

SYNTHETIC_CSAM always escalates to severity 1.0, level critical, and recommended action immediate_intervention regardless of confidence score.

Text Detection

Analyzes text for AI-generated content indicators — LLM-generated text, synthetic identities, AI-enhanced grooming scripts, and more.

POST /api/v1/safety/synthetic-content
Content-Type: application/json

Request

{
  "text": "In conclusion, it is important to note that there are several key factors to consider when evaluating the implications of this complex issue.",
  "context": {
    "language": "en",
    "age_group": "13-15",
    "platform": "discord"
  },
  "external_id": "msg_123",
  "customer_id": "user_456"
}

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.8,
  "level": "high",
  "classification": "suspected_synthetic",
  "confidence": 0.9,
  "risk_score": 0.7,
  "categories": [
    { "tag": "AI_GENERATED_TEXT", "label": "AI-generated text", "confidence": 0.9 }
  ],
  "evidence": [
    { "text": "Uniform hedging pattern", "tactic": "STATISTICAL_LANGUAGE", "weight": 0.6 }
  ],
  "age_calibration": { "applied": true, "age_group": "13-15", "multiplier": 1.3 },
  "recommended_action": "flag_for_review",
  "rationale": "Text shows uniform style and formulaic structure typical of LLM output.",
  "processing_time_ms": 2225,
  "language": "en",
  "language_status": "stable",
  "credits_used": 5
}

Credits: 5

Image Detection

Analyzes uploaded images for AI-generation artifacts using a 6-signal forensic pipeline — running vision analysis, EXIF metadata extraction, pixel statistics, C2PA Content Credentials, watermark detection, and perceptual hashing in parallel.

POST /api/v1/safety/synthetic-content/image
Content-Type: multipart/form-data

How It Works

Upload

Send an image as multipart/form-data. Supported formats: JPEG, PNG, WebP, GIF. Max file size: 10MB.

Multi-Signal Forensic Analysis

Six independent analysis engines run in parallel, each fault-isolated so failures in one don’t affect others:

Vision AI — Forensic prompt inspects pixel-level artifacts (face consistency, skin texture, hand anomalies, background coherence, lighting mismatches)
EXIF Metadata — Checks for AI generator signatures in EXIF tags, XMP data, and PNG text chunks. Flags suspicious absence of camera metadata.
Pixel Statistics — Shannon entropy, edge density (Laplacian convolution), and channel uniformity analysis
C2PA Content Credentials — Detects and validates C2PA manifests from DALL-E, Adobe Firefly, Google Imagen, and other tools
Watermark Detection — Frequency-domain analysis for invisible watermarks (SynthID, Stable Diffusion DWT/DCT)
Perceptual Hashing — DCT-based pHash compared against a database of known synthetic content via Hamming distance

Signal Aggregation

All signals are combined into a weighted ensemble: vision (30%), metadata (15%), pixel statistics (15%), C2PA provenance (15%), watermarks (10%), perceptual hash (15%). The aggregated forensic summary is fed to the classifier.

Classification with Overrides

The classifier produces a final verdict. Two signals can override the classifier with definitive results:

C2PA declares AI generation → forced to confirmed_synthetic with confidence ≥ 0.95
Perceptual hash matches known synthetic → forced to confirmed_synthetic with confidence ≥ 0.90

Request Fields

Field	Type	Required	Description
`file`	File	Yes	Image file (JPEG, PNG, WebP, GIF). Max 10MB.
`age_group`	string	No	Age group context (e.g., `"13-15"`)
`language`	string	No	ISO language code
`platform`	string	No	Platform context
`external_id`	string	No	Your internal reference ID
`customer_id`	string	No	Your customer/user ID

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.9,
  "level": "high",
  "classification": "confirmed_synthetic",
  "confidence": 0.95,
  "risk_score": 0.85,
  "categories": [
    { "tag": "AI_GENERATED_IMAGE", "label": "AI-generated image", "confidence": 0.95 }
  ],
  "evidence": [
    { "text": "Uniform skin texture and symmetrical facial features", "tactic": "AI_GENERATION_ARTIFACT_DETECTION", "weight": 0.9 }
  ],
  "recommended_action": "flag_for_review",
  "rationale": "Multi-signal forensic analysis: vision AI detected artifacts, EXIF metadata lacks camera model, C2PA manifest confirms AI generation by DALL-E 3.",
  "input_type": "image",
  "vision": {
    "is_likely_synthetic": true,
    "synthetic_confidence": 0.95,
    "artifacts": [
      "unnaturally uniform skin texture on cheeks with complete absence of pores",
      "perfectly symmetrical facial features",
      "blurred background with smooth transitions characteristic of AI-generated depth of field"
    ],
    "face_analysis": "Unnaturally smooth skin with no visible pores, perfectly symmetrical features.",
    "overall_assessment": "Image exhibits several indicators of AI generation."
  },
  "metadata_analysis": {
    "format": "png",
    "dimensions": { "width": 1024, "height": 1024 },
    "has_exif": false,
    "has_camera": false,
    "has_gps": false,
    "ai_generator_detected": true,
    "ai_generator": "DALL-E",
    "suspicious_absence": true
  },
  "provenance": {
    "has_c2pa": true,
    "claim_generator": "DALL-E 3",
    "is_ai_generated": true,
    "ai_tool": "DALL-E 3"
  },
  "forensic_signals": {
    "signal_count": 8,
    "sources": [
      { "name": "vision", "signal_count": 3, "confidence_boost": 0.285 },
      { "name": "metadata", "signal_count": 2, "confidence_boost": 0.2 },
      { "name": "provenance", "signal_count": 1, "confidence_boost": 0.5 }
    ],
    "combined_confidence_boost": 0.35
  },
  "perceptual_hash": "a3b2c1d4e5f60718",
  "processing_time_ms": 3100,
  "credits_used": 8
}

Credits: 8 (5 base + 3 vision)

Forensic Signal Sources

Signal	Weight	What It Checks
Vision AI	30%	Face consistency, skin texture, hand anomalies, background coherence, lighting, hair rendering, text/symbols, resolution, AI model signatures
EXIF Metadata	15%	Camera model, GPS, AI generator names in EXIF/XMP, PNG tEXt chunks (Stable Diffusion parameters), suspicious absence of camera data
Pixel Statistics	15%	Shannon entropy, Laplacian edge density, channel uniformity — GAN images have distinctive statistical signatures
C2PA Provenance	15%	Content Credentials manifests from DALL-E, Firefly, Imagen. Definitive when present — overrides classifier.
Watermark	10%	High-frequency energy analysis, periodic pattern detection at known watermark frequencies, LSB distribution, corner entropy
Perceptual Hash	15%	DCT-based 64-bit pHash compared against known-synthetic database. Hamming distance ≤ 10 = match. Definitive when matched.

New Response Fields

Field	Type	Description
`metadata_analysis`	object	EXIF/metadata extraction results — format, dimensions, camera presence, GPS, AI generator detection
`provenance`	object	C2PA Content Credentials — present only when a C2PA manifest is found
`forensic_signals`	object	Multi-signal ensemble summary — signal counts per source and combined confidence boost
`perceptual_hash`	string	64-bit DCT-based perceptual hash of the image
`known_synthetic_match`	object	Present only when the perceptual hash matches a known synthetic image — includes distance and category

Audio Detection

Analyzes uploaded audio using dual-signal forensics: transcript-based text analysis plus spectral analysis — mel spectrogram via vision AI and quantitative audio statistics for synthetic speech indicators.

POST /api/v1/safety/synthetic-content/audio
Content-Type: multipart/form-data

How It Works

Upload

Send an audio file as multipart/form-data. Supported formats: MP3, WAV, M4A, OGG, FLAC, WebM. Max file size: 25MB.

Parallel Analysis

Two analysis tracks run simultaneously:

Transcription — Audio is transcribed using Whisper (EU-hosted, GDPR-compliant)
Spectral Analysis — FFmpeg generates a mel spectrogram image and extracts audio statistics (RMS, dynamic range, silence ratio, flat factor, DC offset)

Spectrogram Vision Analysis

The mel spectrogram is analyzed by a dedicated forensic prompt checking for:

Frequency band uniformity (TTS hallmark)
Harmonic structure anomalies
Missing breath noise and background ambience
Onset/offset patterns, formant transitions
Pitch contour regularity, aliasing artifacts

Synthetic Classification

Transcript, spectral signals, and spectrogram analysis are combined for final classification. Even audio with no speech can be flagged if spectral analysis detects synthetic patterns.

Request Fields

Field	Type	Required	Description
`file`	File	Yes	Audio file (MP3, WAV, M4A, OGG, FLAC, WebM). Max 25MB.
`age_group`	string	No	Age group context
`language`	string	No	ISO language code
`platform`	string	No	Platform context
`external_id`	string	No	Your internal reference ID
`customer_id`	string	No	Your customer/user ID

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.6,
  "level": "medium",
  "classification": "suspected_synthetic",
  "confidence": 0.7,
  "risk_score": 0.5,
  "categories": [
    { "tag": "AI_GENERATED_AUDIO", "label": "AI-Generated Audio", "confidence": 0.7 }
  ],
  "recommended_action": "flag_for_review",
  "rationale": "Spectral analysis shows unnaturally uniform frequency bands and missing breath noise. Transcript structure is consistent with TTS output.",
  "input_type": "audio",
  "transcription": {
    "text": "Hello, I am calling about your account.",
    "language": "en",
    "duration": 3.5,
    "segments": [
      { "start": 0, "end": 3.5, "text": "Hello, I am calling about your account." }
    ]
  },
  "audio_stats": {
    "rms_mean": -18.5,
    "rms_peak": -6.2,
    "dynamic_range": 12.3,
    "silence_ratio": 0.05,
    "flat_factor": 0.002,
    "dc_offset": 0.0001
  },
  "spectral_signals": [
    "low_dynamic_range: 12.3 dB suggests compressed/synthetic audio",
    "low_silence_ratio: 0.05 — natural speech typically has more pauses"
  ],
  "processing_time_ms": 4200,
  "language": "en",
  "language_status": "stable",
  "credits_used": 10
}

Credits: 7 (5 base + 2 transcription) or 10 (+ 3 for spectrogram vision analysis)

New Response Fields

Field	Type	Description
`audio_stats`	object	Quantitative audio statistics — RMS mean/peak, dynamic range, silence ratio, flat factor, DC offset
`spectral_signals`	array	Human-readable spectral analysis indicators (e.g., low dynamic range, frequency uniformity)

If the audio contains no intelligible speech but spectral analysis detects synthetic patterns, the endpoint returns classification: "suspected_synthetic" with spectral signals. If neither speech nor spectral anomalies are found, it returns classification: "unknown".

Video Detection

Analyzes uploaded video with a multi-layer forensic pipeline — per-frame vision analysis, temporal face consistency tracking, audio-visual lip-sync correlation, spectral audio forensics, and transcription.

POST /api/v1/safety/synthetic-content/video
Content-Type: multipart/form-data

How It Works

Upload

Send a video file as multipart/form-data. Supported formats: MP4, WebM, QuickTime, AVI. Max file size: 100MB.

Frame + Audio Extraction

Frames are extracted at even intervals using FFmpeg (default: 6 frames, max: 20). Audio is extracted as a separate track in parallel.

Parallel Multi-Signal Analysis

Five analysis tracks run simultaneously via fault-isolated Promise.allSettled:

Per-Frame Vision — Each frame analyzed for AI artifacts (face consistency, skin texture, background coherence) in concurrent batches of 3
Temporal Consistency — face-api.js detects faces across all frames, computes Euclidean distance between face descriptors (real < 0.4, deepfake > 0.6), and measures landmark stability via eye-to-nose ratio variance
Lip-Sync Correlation — Mouth openness (from 68-point face landmarks) correlated against frame-aligned audio energy (via FFmpeg). Pearson correlation > 0.5 = real speech, < 0.3 = lip-sync deepfake
Spectral Audio Analysis — Mel spectrogram + audio statistics (same as audio endpoint)
Transcription — Whisper transcription of the audio track

Signal Aggregation + Classification

All signals are aggregated via weighted ensemble and fed to the classifier. Temporal anomalies and lip-sync mismatches provide strong deepfake indicators that single-frame analysis cannot detect.

Request Fields

Field	Type	Required	Description
`file`	File	Yes	Video file (MP4, WebM, QuickTime, AVI). Max 100MB.
`max_frames`	number	No	Frames to extract (default: 6, max: 20)
`age_group`	string	No	Age group context
`language`	string	No	ISO language code
`platform`	string	No	Platform context
`external_id`	string	No	Your internal reference ID
`customer_id`	string	No	Your customer/user ID

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.7,
  "level": "high",
  "classification": "suspected_synthetic",
  "confidence": 0.8,
  "risk_score": 0.65,
  "categories": [
    { "tag": "AI_MANIPULATED_MEDIA", "label": "AI-Manipulated Media", "confidence": 0.8 }
  ],
  "recommended_action": "escalate",
  "rationale": "Temporal analysis detected face identity drift across 3 frame pairs. Lip-sync correlation (0.21) is well below the 0.3 threshold for authentic speech.",
  "input_type": "video",
  "video": {
    "duration_seconds": 12.5,
    "frames_analyzed": 6,
    "has_audio": true
  },
  "temporal_consistency": {
    "frames_with_faces": 6,
    "total_frames": 6,
    "identity_consistency_score": 0.42,
    "landmark_stability_score": 0.65,
    "temporal_consistency_score": 0.51,
    "anomalous_frame_pairs": [
      { "frame_a": 1, "frame_b": 2, "distance": 0.72 },
      { "frame_a": 3, "frame_b": 4, "distance": 0.68 }
    ],
    "signals": [
      "identity_drift: 2 frame pairs show face identity changes (max distance: 0.720)",
      "low_identity_consistency: face identity varies significantly across frames (score: 0.42)"
    ]
  },
  "lip_sync": {
    "correlation": 0.21,
    "has_silent_mouth_movement": false,
    "has_voice_without_movement": true,
    "signals": [
      "poor_lip_sync: mouth-audio correlation is 0.21 (threshold: 0.3)",
      "voice_without_movement: audio present in 4/6 frames without mouth movement"
    ]
  },
  "transcription": {
    "text": "Speech content from the video.",
    "language": "en",
    "duration": 12.5
  },
  "audio_stats": {
    "rms_mean": -20.1,
    "dynamic_range": 15.8,
    "silence_ratio": 0.12,
    "flat_factor": 0.001
  },
  "processing_time_ms": 15000,
  "language": "en",
  "credits_used": 25
}

Credits: 5 base + 3 per frame + 2 if audio present For example, 6 frames with audio = 5 + 18 + 2 = 25 credits.

New Response Fields

Field	Type	Description
`temporal_consistency`	object	Face identity tracking across frames — consistency scores, landmark stability, anomalous frame pairs
`lip_sync`	object	Audio-visual lip-sync correlation — Pearson coefficient, silent mouth movement detection, voice-without-movement detection
`audio_stats`	object	Quantitative audio statistics from spectral analysis
`spectral_signals`	array	Spectral analysis indicators (when detected)

Temporal Consistency Signals

Signal	Meaning
`identity_drift`	Face descriptors differ significantly between consecutive frames — classic deepfake artifact
`low_identity_consistency`	Average face identity distance is high across all frames
`unstable_landmarks`	Facial geometry (eye-to-nose ratio) varies abnormally between frames
`consistent_face`	Face identity and geometry are stable — reduces synthetic confidence

Lip-Sync Signals

Signal	Meaning
`poor_lip_sync`	Mouth movement and audio energy have low correlation (< 0.3)
`good_lip_sync`	Strong correlation (> 0.5) between mouth and audio — authentic indicator
`silent_mouth_movement`	Mouth opens in > 30% of frames without corresponding audio
`voice_without_movement`	Audio present in > 30% of frames without mouth movement

If the video has no audio track, the transcription, lip_sync, audio_stats, and spectral_signals fields are omitted and no transcription credits are charged.

Account-Level Synthetic Profiling

Track synthetic content patterns at the account level with a 30-day rolling window. When customer_id is provided on any detection request, Tuteliq automatically builds a synthetic profile for that account.

GET /api/v1/safety/synthetic-content/profile/:customer_id

Response

{
  "customer_id": "user_456",
  "total_items": 50,
  "synthetic_count": 12,
  "authentic_count": 35,
  "unknown_count": 3,
  "avg_confidence": 0.82,
  "category_distribution": {
    "AI_GENERATED_IMAGE": 8,
    "AI_GENERATED_TEXT": 3,
    "AI_MANIPULATED_MEDIA": 1
  },
  "account_synthetic_score": 0.34,
  "trend": "increasing",
  "last_updated": "2026-04-09T14:30:00.000Z",
  "window_days": 30
}

Response Fields

Field	Type	Description
`total_items`	integer	Total items analyzed in the 30-day window
`synthetic_count`	integer	Items classified as `confirmed_synthetic` or `suspected_synthetic`
`authentic_count`	integer	Items classified as `confirmed_authentic`
`unknown_count`	integer	Items classified as `unknown`
`avg_confidence`	float	Average confidence across all classifications
`category_distribution`	object	Count of items per synthetic category
`account_synthetic_score`	float	0.0-1.0 composite score — weighted by ratio and confidence
`trend`	string	`increasing`, `stable`, `decreasing`, or `unknown` — based on first-half vs. second-half synthetic rate

Account profiling is automatic and zero-cost. Pass customer_id on any detection request to start building the profile. The profile endpoint itself does not consume credits.

Credit Summary

Endpoint	Path	Credits
Text	`/safety/synthetic-content`	2
Image	`/safety/synthetic-content/image`	5
Audio	`/safety/synthetic-content/audio`	4-7
Video	`/safety/synthetic-content/video`	2 + 3/frame + 2 (audio)
Profile	`/safety/synthetic-content/profile/:id`	0

Multi-Endpoint Support

The text-based synthetic content detector is available in the Multi-Endpoint fan-out. Include "synthetic-content" in your endpoint list:

{
  "text": "content to analyze",
  "endpoints": ["bullying", "grooming", "synthetic-content"]
}

Image, audio, and video synthetic detection are multipart-only and not available through the multi-endpoint batch.

Getting Started

SDKs

Integrations

Verification

Advanced

Compliance

Synthetic Content Detection

6-Signal Image Pipeline

Temporal + Lip-Sync Video

Spectral Audio Forensics

Classification Levels

Category Taxonomy

Text Detection

Request

Response

Image Detection

How It Works

Request Fields

Response

Forensic Signal Sources

New Response Fields

Audio Detection

How It Works

Request Fields

Response

New Response Fields

Video Detection

How It Works

Request Fields

Response

New Response Fields

Temporal Consistency Signals

Lip-Sync Signals

Account-Level Synthetic Profiling

Response

Response Fields

Credit Summary

Multi-Endpoint Support

Getting Started

SDKs

Integrations

Verification

Advanced

Compliance

6-Signal Image Pipeline

Temporal + Lip-Sync Video

Spectral Audio Forensics

​Classification Levels

​Category Taxonomy

​Text Detection

​Request

​Response

​Image Detection

​How It Works

​Request Fields

​Response

​Forensic Signal Sources

​New Response Fields

​Audio Detection

​How It Works

​Request Fields

​Response

​New Response Fields

​Video Detection

​How It Works

​Request Fields

​Response

​New Response Fields

​Temporal Consistency Signals

​Lip-Sync Signals

​Account-Level Synthetic Profiling

​Response

​Response Fields

​Credit Summary

​Multi-Endpoint Support

Classification Levels

Category Taxonomy

Text Detection

Request

Response

Image Detection

How It Works

Request Fields

Response

Forensic Signal Sources

New Response Fields

Audio Detection

How It Works

Request Fields

Response

New Response Fields

Video Detection

How It Works

Request Fields

Response

New Response Fields

Temporal Consistency Signals

Lip-Sync Signals

Account-Level Synthetic Profiling

Response

Response Fields

Credit Summary

Multi-Endpoint Support