Skip to main content
Tuteliq detects AI-generated and synthetic content across all four modalities — text, image, audio, and video. Each endpoint classifies content using the standardized taxonomy from the Child Protection Blueprint: confirmed_synthetic, suspected_synthetic, unknown, or confirmed_authentic. For images and video, Tuteliq goes far beyond a single model call. The multi-signal forensic pipeline runs up to 6 independent analysis engines in parallel — vision forensics, EXIF metadata, pixel statistics, C2PA Content Credentials, watermark detection, and perceptual hashing — then aggregates them into a weighted ensemble assessment. Any single engine can fail without degrading the result.

6-Signal Image Pipeline

Vision AI, EXIF metadata, pixel statistics, C2PA Content Credentials, frequency-domain watermark analysis, and perceptual hash matching — all in parallel.

Temporal + Lip-Sync Video

Frame-by-frame face identity tracking, landmark stability analysis, and audio-visual lip-sync correlation to catch deepfakes that single-frame analysis misses.

Spectral Audio Forensics

Mel spectrogram analysis via vision AI plus quantitative audio statistics — dynamic range, silence ratio, flat factor — to detect synthetic speech beyond transcript analysis.

Classification Levels

ClassificationDescription
confirmed_syntheticHigh confidence AI-generated content detected
suspected_syntheticModerate indicators of synthetic content
unknownInsufficient data to determine authenticity
confirmed_authenticHigh confidence genuine, human-created content

Category Taxonomy

TagDescription
AI_GENERATED_TEXTLLM-generated text (ChatGPT, Claude, etc.)
AI_GENERATED_IMAGEAI-generated image (Midjourney, DALL-E, Stable Diffusion)
AI_GENERATED_AUDIOVoice cloning, text-to-speech synthesis
AI_GENERATED_VIDEOFully AI-generated video content
AI_MANIPULATED_MEDIADeepfakes, face swaps, manipulated media
SYNTHETIC_IDENTITYFake identity created using AI tools
SYNTHETIC_CSAMAI-generated child sexual abuse material
SYNTHETIC_IMPERSONATIONAI-generated impersonation of a real person
AI_ENHANCED_GROOMINGAI-assisted grooming scripts
AI_ENHANCED_SEXTORTIONAI-assisted sextortion content
SYNTHETIC_CSAM always escalates to severity 1.0, level critical, and recommended action immediate_intervention regardless of confidence score.

Text Detection

Analyzes text for AI-generated content indicators — LLM-generated text, synthetic identities, AI-enhanced grooming scripts, and more.
POST /api/v1/safety/synthetic-content
Content-Type: application/json

Request

{
  "text": "In conclusion, it is important to note that there are several key factors to consider when evaluating the implications of this complex issue.",
  "context": {
    "language": "en",
    "age_group": "13-15",
    "platform": "discord"
  },
  "external_id": "msg_123",
  "customer_id": "user_456"
}

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.8,
  "level": "high",
  "classification": "suspected_synthetic",
  "confidence": 0.9,
  "risk_score": 0.7,
  "categories": [
    { "tag": "AI_GENERATED_TEXT", "label": "AI-generated text", "confidence": 0.9 }
  ],
  "evidence": [
    { "text": "Uniform hedging pattern", "tactic": "STATISTICAL_LANGUAGE", "weight": 0.6 }
  ],
  "age_calibration": { "applied": true, "age_group": "13-15", "multiplier": 1.3 },
  "recommended_action": "flag_for_review",
  "rationale": "Text shows uniform style and formulaic structure typical of LLM output.",
  "processing_time_ms": 2225,
  "language": "en",
  "language_status": "stable",
  "credits_used": 5
}
Credits: 5

Image Detection

Analyzes uploaded images for AI-generation artifacts using a 6-signal forensic pipeline — running vision analysis, EXIF metadata extraction, pixel statistics, C2PA Content Credentials, watermark detection, and perceptual hashing in parallel.
POST /api/v1/safety/synthetic-content/image
Content-Type: multipart/form-data

How It Works

1

Upload

Send an image as multipart/form-data. Supported formats: JPEG, PNG, WebP, GIF. Max file size: 10MB.
2

Multi-Signal Forensic Analysis

Six independent analysis engines run in parallel, each fault-isolated so failures in one don’t affect others:
  1. Vision AI — Forensic prompt inspects pixel-level artifacts (face consistency, skin texture, hand anomalies, background coherence, lighting mismatches)
  2. EXIF Metadata — Checks for AI generator signatures in EXIF tags, XMP data, and PNG text chunks. Flags suspicious absence of camera metadata.
  3. Pixel Statistics — Shannon entropy, edge density (Laplacian convolution), and channel uniformity analysis
  4. C2PA Content Credentials — Detects and validates C2PA manifests from DALL-E, Adobe Firefly, Google Imagen, and other tools
  5. Watermark Detection — Frequency-domain analysis for invisible watermarks (SynthID, Stable Diffusion DWT/DCT)
  6. Perceptual Hashing — DCT-based pHash compared against a database of known synthetic content via Hamming distance
3

Signal Aggregation

All signals are combined into a weighted ensemble: vision (30%), metadata (15%), pixel statistics (15%), C2PA provenance (15%), watermarks (10%), perceptual hash (15%). The aggregated forensic summary is fed to the classifier.
4

Classification with Overrides

The classifier produces a final verdict. Two signals can override the classifier with definitive results:
  • C2PA declares AI generation → forced to confirmed_synthetic with confidence ≥ 0.95
  • Perceptual hash matches known synthetic → forced to confirmed_synthetic with confidence ≥ 0.90

Request Fields

FieldTypeRequiredDescription
fileFileYesImage file (JPEG, PNG, WebP, GIF). Max 10MB.
age_groupstringNoAge group context (e.g., "13-15")
languagestringNoISO language code
platformstringNoPlatform context
external_idstringNoYour internal reference ID
customer_idstringNoYour customer/user ID

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.9,
  "level": "high",
  "classification": "confirmed_synthetic",
  "confidence": 0.95,
  "risk_score": 0.85,
  "categories": [
    { "tag": "AI_GENERATED_IMAGE", "label": "AI-generated image", "confidence": 0.95 }
  ],
  "evidence": [
    { "text": "Uniform skin texture and symmetrical facial features", "tactic": "AI_GENERATION_ARTIFACT_DETECTION", "weight": 0.9 }
  ],
  "recommended_action": "flag_for_review",
  "rationale": "Multi-signal forensic analysis: vision AI detected artifacts, EXIF metadata lacks camera model, C2PA manifest confirms AI generation by DALL-E 3.",
  "input_type": "image",
  "vision": {
    "is_likely_synthetic": true,
    "synthetic_confidence": 0.95,
    "artifacts": [
      "unnaturally uniform skin texture on cheeks with complete absence of pores",
      "perfectly symmetrical facial features",
      "blurred background with smooth transitions characteristic of AI-generated depth of field"
    ],
    "face_analysis": "Unnaturally smooth skin with no visible pores, perfectly symmetrical features.",
    "overall_assessment": "Image exhibits several indicators of AI generation."
  },
  "metadata_analysis": {
    "format": "png",
    "dimensions": { "width": 1024, "height": 1024 },
    "has_exif": false,
    "has_camera": false,
    "has_gps": false,
    "ai_generator_detected": true,
    "ai_generator": "DALL-E",
    "suspicious_absence": true
  },
  "provenance": {
    "has_c2pa": true,
    "claim_generator": "DALL-E 3",
    "is_ai_generated": true,
    "ai_tool": "DALL-E 3"
  },
  "forensic_signals": {
    "signal_count": 8,
    "sources": [
      { "name": "vision", "signal_count": 3, "confidence_boost": 0.285 },
      { "name": "metadata", "signal_count": 2, "confidence_boost": 0.2 },
      { "name": "provenance", "signal_count": 1, "confidence_boost": 0.5 }
    ],
    "combined_confidence_boost": 0.35
  },
  "perceptual_hash": "a3b2c1d4e5f60718",
  "processing_time_ms": 3100,
  "credits_used": 8
}
Credits: 8 (5 base + 3 vision)

Forensic Signal Sources

SignalWeightWhat It Checks
Vision AI30%Face consistency, skin texture, hand anomalies, background coherence, lighting, hair rendering, text/symbols, resolution, AI model signatures
EXIF Metadata15%Camera model, GPS, AI generator names in EXIF/XMP, PNG tEXt chunks (Stable Diffusion parameters), suspicious absence of camera data
Pixel Statistics15%Shannon entropy, Laplacian edge density, channel uniformity — GAN images have distinctive statistical signatures
C2PA Provenance15%Content Credentials manifests from DALL-E, Firefly, Imagen. Definitive when present — overrides classifier.
Watermark10%High-frequency energy analysis, periodic pattern detection at known watermark frequencies, LSB distribution, corner entropy
Perceptual Hash15%DCT-based 64-bit pHash compared against known-synthetic database. Hamming distance ≤ 10 = match. Definitive when matched.

New Response Fields

FieldTypeDescription
metadata_analysisobjectEXIF/metadata extraction results — format, dimensions, camera presence, GPS, AI generator detection
provenanceobjectC2PA Content Credentials — present only when a C2PA manifest is found
forensic_signalsobjectMulti-signal ensemble summary — signal counts per source and combined confidence boost
perceptual_hashstring64-bit DCT-based perceptual hash of the image
known_synthetic_matchobjectPresent only when the perceptual hash matches a known synthetic image — includes distance and category

Audio Detection

Analyzes uploaded audio using dual-signal forensics: transcript-based text analysis plus spectral analysis — mel spectrogram via vision AI and quantitative audio statistics for synthetic speech indicators.
POST /api/v1/safety/synthetic-content/audio
Content-Type: multipart/form-data

How It Works

1

Upload

Send an audio file as multipart/form-data. Supported formats: MP3, WAV, M4A, OGG, FLAC, WebM. Max file size: 25MB.
2

Parallel Analysis

Two analysis tracks run simultaneously:
  1. Transcription — Audio is transcribed using Whisper (EU-hosted, GDPR-compliant)
  2. Spectral Analysis — FFmpeg generates a mel spectrogram image and extracts audio statistics (RMS, dynamic range, silence ratio, flat factor, DC offset)
3

Spectrogram Vision Analysis

The mel spectrogram is analyzed by a dedicated forensic prompt checking for:
  • Frequency band uniformity (TTS hallmark)
  • Harmonic structure anomalies
  • Missing breath noise and background ambience
  • Onset/offset patterns, formant transitions
  • Pitch contour regularity, aliasing artifacts
4

Synthetic Classification

Transcript, spectral signals, and spectrogram analysis are combined for final classification. Even audio with no speech can be flagged if spectral analysis detects synthetic patterns.

Request Fields

FieldTypeRequiredDescription
fileFileYesAudio file (MP3, WAV, M4A, OGG, FLAC, WebM). Max 25MB.
age_groupstringNoAge group context
languagestringNoISO language code
platformstringNoPlatform context
external_idstringNoYour internal reference ID
customer_idstringNoYour customer/user ID

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.6,
  "level": "medium",
  "classification": "suspected_synthetic",
  "confidence": 0.7,
  "risk_score": 0.5,
  "categories": [
    { "tag": "AI_GENERATED_AUDIO", "label": "AI-Generated Audio", "confidence": 0.7 }
  ],
  "recommended_action": "flag_for_review",
  "rationale": "Spectral analysis shows unnaturally uniform frequency bands and missing breath noise. Transcript structure is consistent with TTS output.",
  "input_type": "audio",
  "transcription": {
    "text": "Hello, I am calling about your account.",
    "language": "en",
    "duration": 3.5,
    "segments": [
      { "start": 0, "end": 3.5, "text": "Hello, I am calling about your account." }
    ]
  },
  "audio_stats": {
    "rms_mean": -18.5,
    "rms_peak": -6.2,
    "dynamic_range": 12.3,
    "silence_ratio": 0.05,
    "flat_factor": 0.002,
    "dc_offset": 0.0001
  },
  "spectral_signals": [
    "low_dynamic_range: 12.3 dB suggests compressed/synthetic audio",
    "low_silence_ratio: 0.05 — natural speech typically has more pauses"
  ],
  "processing_time_ms": 4200,
  "language": "en",
  "language_status": "stable",
  "credits_used": 10
}
Credits: 7 (5 base + 2 transcription) or 10 (+ 3 for spectrogram vision analysis)

New Response Fields

FieldTypeDescription
audio_statsobjectQuantitative audio statistics — RMS mean/peak, dynamic range, silence ratio, flat factor, DC offset
spectral_signalsarrayHuman-readable spectral analysis indicators (e.g., low dynamic range, frequency uniformity)
If the audio contains no intelligible speech but spectral analysis detects synthetic patterns, the endpoint returns classification: "suspected_synthetic" with spectral signals. If neither speech nor spectral anomalies are found, it returns classification: "unknown".

Video Detection

Analyzes uploaded video with a multi-layer forensic pipeline — per-frame vision analysis, temporal face consistency tracking, audio-visual lip-sync correlation, spectral audio forensics, and transcription.
POST /api/v1/safety/synthetic-content/video
Content-Type: multipart/form-data

How It Works

1

Upload

Send a video file as multipart/form-data. Supported formats: MP4, WebM, QuickTime, AVI. Max file size: 100MB.
2

Frame + Audio Extraction

Frames are extracted at even intervals using FFmpeg (default: 6 frames, max: 20). Audio is extracted as a separate track in parallel.
3

Parallel Multi-Signal Analysis

Five analysis tracks run simultaneously via fault-isolated Promise.allSettled:
  1. Per-Frame Vision — Each frame analyzed for AI artifacts (face consistency, skin texture, background coherence) in concurrent batches of 3
  2. Temporal Consistency — face-api.js detects faces across all frames, computes Euclidean distance between face descriptors (real < 0.4, deepfake > 0.6), and measures landmark stability via eye-to-nose ratio variance
  3. Lip-Sync Correlation — Mouth openness (from 68-point face landmarks) correlated against frame-aligned audio energy (via FFmpeg). Pearson correlation > 0.5 = real speech, < 0.3 = lip-sync deepfake
  4. Spectral Audio Analysis — Mel spectrogram + audio statistics (same as audio endpoint)
  5. Transcription — Whisper transcription of the audio track
4

Signal Aggregation + Classification

All signals are aggregated via weighted ensemble and fed to the classifier. Temporal anomalies and lip-sync mismatches provide strong deepfake indicators that single-frame analysis cannot detect.

Request Fields

FieldTypeRequiredDescription
fileFileYesVideo file (MP4, WebM, QuickTime, AVI). Max 100MB.
max_framesnumberNoFrames to extract (default: 6, max: 20)
age_groupstringNoAge group context
languagestringNoISO language code
platformstringNoPlatform context
external_idstringNoYour internal reference ID
customer_idstringNoYour customer/user ID

Response

{
  "endpoint": "synthetic-content",
  "detected": true,
  "severity": 0.7,
  "level": "high",
  "classification": "suspected_synthetic",
  "confidence": 0.8,
  "risk_score": 0.65,
  "categories": [
    { "tag": "AI_MANIPULATED_MEDIA", "label": "AI-Manipulated Media", "confidence": 0.8 }
  ],
  "recommended_action": "escalate",
  "rationale": "Temporal analysis detected face identity drift across 3 frame pairs. Lip-sync correlation (0.21) is well below the 0.3 threshold for authentic speech.",
  "input_type": "video",
  "video": {
    "duration_seconds": 12.5,
    "frames_analyzed": 6,
    "has_audio": true
  },
  "temporal_consistency": {
    "frames_with_faces": 6,
    "total_frames": 6,
    "identity_consistency_score": 0.42,
    "landmark_stability_score": 0.65,
    "temporal_consistency_score": 0.51,
    "anomalous_frame_pairs": [
      { "frame_a": 1, "frame_b": 2, "distance": 0.72 },
      { "frame_a": 3, "frame_b": 4, "distance": 0.68 }
    ],
    "signals": [
      "identity_drift: 2 frame pairs show face identity changes (max distance: 0.720)",
      "low_identity_consistency: face identity varies significantly across frames (score: 0.42)"
    ]
  },
  "lip_sync": {
    "correlation": 0.21,
    "has_silent_mouth_movement": false,
    "has_voice_without_movement": true,
    "signals": [
      "poor_lip_sync: mouth-audio correlation is 0.21 (threshold: 0.3)",
      "voice_without_movement: audio present in 4/6 frames without mouth movement"
    ]
  },
  "transcription": {
    "text": "Speech content from the video.",
    "language": "en",
    "duration": 12.5
  },
  "audio_stats": {
    "rms_mean": -20.1,
    "dynamic_range": 15.8,
    "silence_ratio": 0.12,
    "flat_factor": 0.001
  },
  "processing_time_ms": 15000,
  "language": "en",
  "credits_used": 25
}
Credits: 5 base + 3 per frame + 2 if audio present For example, 6 frames with audio = 5 + 18 + 2 = 25 credits.

New Response Fields

FieldTypeDescription
temporal_consistencyobjectFace identity tracking across frames — consistency scores, landmark stability, anomalous frame pairs
lip_syncobjectAudio-visual lip-sync correlation — Pearson coefficient, silent mouth movement detection, voice-without-movement detection
audio_statsobjectQuantitative audio statistics from spectral analysis
spectral_signalsarraySpectral analysis indicators (when detected)

Temporal Consistency Signals

SignalMeaning
identity_driftFace descriptors differ significantly between consecutive frames — classic deepfake artifact
low_identity_consistencyAverage face identity distance is high across all frames
unstable_landmarksFacial geometry (eye-to-nose ratio) varies abnormally between frames
consistent_faceFace identity and geometry are stable — reduces synthetic confidence

Lip-Sync Signals

SignalMeaning
poor_lip_syncMouth movement and audio energy have low correlation (< 0.3)
good_lip_syncStrong correlation (> 0.5) between mouth and audio — authentic indicator
silent_mouth_movementMouth opens in > 30% of frames without corresponding audio
voice_without_movementAudio present in > 30% of frames without mouth movement
If the video has no audio track, the transcription, lip_sync, audio_stats, and spectral_signals fields are omitted and no transcription credits are charged.

Account-Level Synthetic Profiling

Track synthetic content patterns at the account level with a 30-day rolling window. When customer_id is provided on any detection request, Tuteliq automatically builds a synthetic profile for that account.
GET /api/v1/safety/synthetic-content/profile/:customer_id

Response

{
  "customer_id": "user_456",
  "total_items": 50,
  "synthetic_count": 12,
  "authentic_count": 35,
  "unknown_count": 3,
  "avg_confidence": 0.82,
  "category_distribution": {
    "AI_GENERATED_IMAGE": 8,
    "AI_GENERATED_TEXT": 3,
    "AI_MANIPULATED_MEDIA": 1
  },
  "account_synthetic_score": 0.34,
  "trend": "increasing",
  "last_updated": "2026-04-09T14:30:00.000Z",
  "window_days": 30
}

Response Fields

FieldTypeDescription
total_itemsintegerTotal items analyzed in the 30-day window
synthetic_countintegerItems classified as confirmed_synthetic or suspected_synthetic
authentic_countintegerItems classified as confirmed_authentic
unknown_countintegerItems classified as unknown
avg_confidencefloatAverage confidence across all classifications
category_distributionobjectCount of items per synthetic category
account_synthetic_scorefloat0.0-1.0 composite score — weighted by ratio and confidence
trendstringincreasing, stable, decreasing, or unknown — based on first-half vs. second-half synthetic rate
Account profiling is automatic and zero-cost. Pass customer_id on any detection request to start building the profile. The profile endpoint itself does not consume credits.

Credit Summary

EndpointPathCredits
Text/safety/synthetic-content2
Image/safety/synthetic-content/image5
Audio/safety/synthetic-content/audio4-7
Video/safety/synthetic-content/video2 + 3/frame + 2 (audio)
Profile/safety/synthetic-content/profile/:id0

Multi-Endpoint Support

The text-based synthetic content detector is available in the Multi-Endpoint fan-out. Include "synthetic-content" in your endpoint list:
{
  "text": "content to analyze",
  "endpoints": ["bullying", "grooming", "synthetic-content"]
}
Image, audio, and video synthetic detection are multipart-only and not available through the multi-endpoint batch.