Quick start
How it works
Extract
Text is extracted from each page using the PDF text layer. Pages with fewer than 20 characters of extractable text are skipped. A SHA-256 hash of the raw file is computed for chain-of-custody verification.
Detect
Each page is analyzed against your chosen detection endpoints in parallel (bounded concurrency of 3 pages at a time). Long pages are chunked before analysis.
Available endpoints
You can run any combination of these 8 detection endpoints against each page:| Endpoint name | Detection type |
|---|---|
unsafe | Harmful content across all KOSA categories |
bullying | Cyberbullying and harassment |
grooming | Grooming patterns |
social-engineering | Social engineering tactics |
coercive-control | Coercive control patterns |
radicalisation | Radicalisation indicators |
romance-scam | Romance scam patterns |
mule-recruitment | Money mule recruitment |
endpoints is omitted): unsafe, coercive-control, radicalisation.
Request parameters
Upload your PDF as amultipart/form-data request. The file field must be named file.
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file (max 50 MB) |
endpoints | string | No | JSON array of endpoint names, or comma-separated list. Defaults to ["unsafe","coercive-control","radicalisation"]. |
file_id | string | No | Your identifier for the file (echoed back in the response) |
external_id | string | No | External reference ID (echoed back) |
customer_id | string | No | Customer reference ID (echoed back) |
age_group | string | No | "under 10", "10-12", "13-15", "16-17", or "under 18" |
language | string | No | ISO 639-1 code. Auto-detected if omitted. |
platform | string | No | Platform name for context-aware scoring |
support_threshold | string | No | Minimum severity to include crisis helplines. Default: "high". |
metadata | string | No | JSON object with custom metadata (echoed back) |
Response
Key response fields
| Field | Description |
|---|---|
document_hash | SHA-256 hash of the uploaded PDF for chain-of-custody verification |
total_pages | Total pages in the document |
pages_analyzed | Pages with sufficient text that were analyzed |
extraction_summary | Breakdown of text extraction results per page |
page_results | Per-page detection results from each endpoint |
overall_risk_score | Highest risk score across all pages (0.0–1.0) |
overall_severity | none, low, medium, high, or critical |
detected_endpoints | Unique list of endpoints that detected threats |
flagged_pages | Pages with risk score >= 0.3, with their detected endpoints |
credits_used | Dynamic credit cost based on pages analyzed and endpoints used |
Credit pricing
Document analysis uses dynamic pricing based on the actual work performed:| Document | Endpoints | Credits |
|---|---|---|
| 1 page, 3 default endpoints | 3 | 3 (minimum) |
| 5 pages, 3 default endpoints | 3 | 15 |
| 10 pages, 1 endpoint | 1 | 10 |
| 20 pages, 8 endpoints | 8 | 160 |
| 100 pages, 8 endpoints | 8 | 800 |
Choose your endpoints carefully. Running 8 endpoints on a 100-page document costs 800 credits. For most use cases, the 3 default endpoints (
unsafe, coercive-control, radicalisation) provide comprehensive coverage.Chain-of-custody
Every response includes adocument_hash — a SHA-256 hash of the exact bytes uploaded. Use this to:
- Prove which file was analyzed in compliance audits
- Verify document integrity if the same file is analyzed again
- Include in incident reports for regulatory submissions
Zero retention
No document data is stored. The PDF is processed entirely in memory, analyzed, and discarded. The response is the only output. This is the same privacy-by-design approach used across all Tuteliq endpoints.
Limits
| Limit | Value |
|---|---|
| Max file size | 50 MB |
| Max pages | 100 |
| Supported formats | PDF only (application/pdf) |
| Min text per page | 20 characters (pages below this are skipped) |
| Concurrency | 3 pages analyzed simultaneously |
Tier access
Document analysis is available on Indie tier and above. Starter tier does not have access to this endpoint.Error codes
| Code | Description |
|---|---|
ANALYSIS_6010 | PDF extraction failed (corrupt or password-protected file) |
ANALYSIS_6011 | Document exceeds 100-page limit |
FILE_MISSING | No file uploaded |
FILE_INVALID_TYPE | Non-PDF file uploaded |
FILE_TOO_LARGE | File exceeds 50 MB |