Production multimodal scam-intelligence for ScamCheck: screenshot/image upload, lightweight OCR (Cloud Vision + Gemini fallback), deterministic fraud-signal detection, gated deep Gemini-vision analysis, and semantic comparison against known scam clusters via BigQuery VECTOR_SEARCH. Cost-gated, serverless, scale-to-zero.
ScamCheck now analyzes screenshots (WhatsApp/Telegram/Instagram DMs, fake UPI/payment confirmations, phishing UIs, banking SMS) in addition to text. The pipeline is cost-gated: cheap OCR + deterministic signals run first; expensive Gemini-vision inference fires only when the verdict is ambiguous.
screenshot ─► [1] OCR (Cloud Vision TEXT_DETECTION → Gemini fallback) ── cheap
→ text + word bounding boxes (highlight regions)
─► [2] enrich(text): scam category/severity/tactics + trust signals ── 0 Vertex
+ visual-heuristic detectors (fake payment, OTP, KYC phish, urgency, impersonation…)
─► [3] embedQuery(text) → VECTOR_SEARCH over scam corpus ── cheap (cached)
→ similar known scam patterns
─► [4] GATED deep Gemini-vision verdict ── expensive, only if riskScore ∈ [25,70] or forceDeep
→ blended risk + rationale
─► verdict + riskScore + confidence + signals + regions + similar (JSON)
gemini-2.5-flash vision pass runs only for mid-risk/ambiguous screenshots (or forceDeep), preserving scale-to-zero.| File | Role |
|---|---|
lib/scam-intel/ocr.ts | ocrImage() — Cloud Vision TEXT_DETECTION (+word boxes) → Gemini fallback. Multilingual (en/hi hints). |
lib/scam-intel/multimodal.ts | analyzeScreenshot() — orchestrates OCR → enrichment + visual detectors → semantic similarity → gated deep vision → verdict. |
lib/intelligence/enrichment.ts | Reused for scam category/severity/tactics + trust signals on the OCR text. |
lib/store/bigquery.ts | vectorSearch(..., { sourceTypes }) for "similar known scams". |
Fake payment/UPI confirmation · OTP/PIN/CVV request · KYC/verification phishing · urgency/pressure · brand/authority impersonation · lottery/reward/job bait · suspicious links/shorteners · move-to-WhatsApp/call-this-number. Each contributes to a 0–100 risk score; danger signals weigh more than warnings.
POST /api/scam-intel/screenshot (public, rate-limited 12/min){ "imageBase64": "<...>", "mime": "image/png", "forceDeep": false }image (file) [+ forceDeep]{ "verdict":"likely_scam","riskScore":84,"confidence":0.82,
"ocr":{"text":"…","engine":"cloud-vision","lang":"en","wordCount":42},
"regions":[{"text":"OTP","x":120,"y":340,"w":60,"h":28}],
"classification":{"category":"otp_fraud","severity":"high","tactics":["urgency","impersonation"]},
"trust":{"score":10,"band":"standard"},
"visualSignals":[{"id":"otp_request","label":"OTP / PIN / CVV request","severity":"danger","evidence":"do not share otp"}],
"similar":[{"id":"…","title":"…","url":"…","confidence":0.79,"confidenceBand":"high"}],
"deepAnalysisUsed":true,"deepAnalysis":"Spoofed bank UI requesting OTP; classic account-takeover." }
Always returns structured JSON (wrapped by lib/api/json.ts — no HTML error pages).
app/scamcheck/screenshot/page.tsx — drag/drop or tap-to-upload, screenshot preview with highlighted suspicious regions (scaled Vision boxes), editable OCR text, verdict + risk + confidence, fraud-signal list, and similar known scams. Mobile-friendly, dark theme. Images are processed in-request and not stored.
lib/scam-intel/extract-entities.ts, deterministic): phone numbers, URLs, link shorteners/risky TLDs, UPI VPAs, ₹ amounts, QR/payment-collect references, urgency + impersonation markers. Entity risk feeds the score.computeTrustScore → trustScore, scamProbability, AI explanation), the semantic-search + scam-intel similarity pipelines (VECTOR_SEARCH over scam sources).verdict, riskScore, scamProbability, trustScore, explanation, safetyAdvice[], entities, visualSignals, regions, similar, classification, deepAnalysisUsed.cached: true). (Pixel resize/compression is delegated to Gemini's server-side downsampling since the runtime has no image lib; uploads are capped at 6 MB.)logImageAnalysis() writes one best-effort row per scan to scam_image_analysis (verdict, risk, scam probability, trust, category, entity counts, deep-used) — self-ensuring table, never blocks the response.lib/scam-intel/__fixtures__/screenshot-scams.ts (fake SBI SMS, fake courier customs, fake UPI refund, fake KYC) + runnable scripts/test-screenshot-fixtures.mjs asserting expected signals/entities/risk (4/4 passing, offline).lib/scam-intel/calibration.ts): uncertainty penalty (high raw risk + thin evidence → pulled toward neutral, confidence cut), evidence-weighted boost, source-reliability weighting, and a low-confidence needs_review fallback when OCR text is too sparse. Anti over-confidence.lib/scam-intel/url-intel.ts): punycode, non-ASCII homoglyphs, brand look-alikes (edit-distance vs SBI/HDFC/ICICI/Paytm/PhonePe/Amazon/India Post…), shorteners, suspicious TLDs, raw-IP URLs, digit-substitution, excessive subdomains.explainability { whyFlagged, evidence[], matchingPatterns[], confidenceReasoning[] }.timings { ocrMs, embedMs, vectorMs, deepMs, totalMs } + estCostUsd returned and logged (event=multimodal.analyzed)./datasets/{scam,legit}-samples/ + scripts/benchmark-scamcheck.mjs (precision/recall/F1/FP/FN/entity accuracy; offline deterministic layer = P/R/F1 1.0 on the 16-sample set, live mode for OCR/retrieval).GET /api/scam-intel/dashboard?days=30 (ADMIN) — totals, verdict + category distribution, OCR failures, deep-vision usage, avg risk/scam-probability, entity totals, daily trend (from scam_image_analysis).gcloud services enable vision.googleapis.com) for cheap OCR + region boxes; without it, the Gemini OCR fallback still works (slightly higher cost, no boxes).VERTEX_VISION_MODEL (deep, default gemini-2.5-flash), VERTEX_OCR_VISION_MODEL (OCR fallback). ADC on Cloud Run needs no keys.[25,70] band in multimodal.ts to trade cost vs sensitivity. Query embeddings are cached.