How AI Execution Lab runs autonomously on free/hobby plans: model-tier routing, content-addressed caching, semantic deduplication, publish throttling, empty-queue early-exit crons, Firestore read/write minimization via increment counters, and batched embeddings. Includes expensive-operation analysis, scaling bottlenecks, the cheapest viable architecture, and estimated monthly cost ranges.
The system is designed to run autonomously and cheaply — ideally inside free tiers. Every expensive operation is gated, cached, deduplicated, or throttled.
| Operation | Cost driver | Mitigation |
|---|---|---|
| Article generation (Pro) | output tokens (~$10/1M) | DEEP_TIER=flash switch; cached by content hash; 1 bundle per scam pattern |
| Structured outputs (Flash) | many small calls | Flash tier; strict-JSON (fewer tokens); cached |
| Embeddings | per-token | multilingual model; batched 25/request; hash fallback offline |
| Vercel function time | invocations × duration | empty-queue early-exit; daily crons on hobby |
| Firestore | reads/writes/scans | increment counters instead of scans; dedup avoids re-writes |
_ai_cache, 7-day TTL) — identical inputs are free; cache hits are audited so the hit-rate shows on the dashboard.cluster.bundleId so it never regenerates.PUBLISH_PER_HOUR (12) + BUNDLES_PER_DAY (20) caps; throttled jobs requeue without consuming a retry.increment() (1 write), never collection scans, on the hot path.vector-search.ts, dashboard aggregations) read up to a few thousand docs — fine to ~10k. Fix at scale: Firestore Vector Search (indexes already defined) replaces the scan with native KNN; dashboards move to pre-aggregated counters.AUTOPILOT_PER_RUN (3). Raise gradually as budget allows.DEEP_TIER=flash, small AUTOPILOT_PER_RUN, aggressive cache TTL.maxDuration generous only where needed (autopilot 300s; others 60s).Assumes Vertex pay-as-you-go at the estimated prices in usage.ts (confirm against Google's pricing). Token estimate per full bilingual bundle ≈ 6 generations × (~600 in / ~900 out) plus one Pro article (~700 in / ~1.2k out).
| Scenario | Bundles/mo | AI tier | Est. AI cost | Vercel | Firestore | Total/mo |
|---|---|---|---|---|---|---|
| Idle / pilot | ~30 | Flash-only (DEEP_TIER=flash) | ~$0.30–$1 | Hobby $0 | Spark $0 | ~$0.30–$1 |
| Light autonomous | ~90 (3/day) | Flash + Pro article | ~$3–$8 | Hobby $0 | Spark $0 | ~$3–$8 |
| Active | ~300 (10/day) | Flash + Pro article | ~$12–$30 | Hobby/Pro $0–$20 | Spark→Blaze ~$0–$5 | ~$15–$55 |
| Heavy | ~1000 | Flash + Pro article | ~$45–$110 | Pro $20 | Blaze ~$5–$20 | ~$70–$150 |
Flash-only mode cuts the AI line ~3–4×. Caching + dedup typically remove 30–60% of generations in steady state (re-runs, near-duplicates), so real cost trends toward the low end of each range.
These are planning estimates, not quotes. Set budget alerts in Google Cloud Billing and watch
/ops/analytics→ AI cost today.