A founder showed us his 'AI strategy' deck last quarter. Slide three said, in 60-point type: RAG. We asked what the agent should do when a customer asks about a refund that policy doesn't cover. Silence. That silence is where most AI projects actually fail — not in the retrieval pipeline.
The common belief
The industry has settled on a comfortable equation: hallucinations are the problem, retrieval is the cure, therefore RAG is the strategy. Bolt a vector database onto a model, point it at your docs, ship. It's not wrong — grounding answers in your own content beats raw model output every time. It's just radically incomplete.
The most valuable line in your agent's behaviour spec is the list of questions it must refuse.
Why it's incomplete
RAG answers one question: where does knowledge come from? A production agent has to answer four more. What actions can it take? When must it hand off to a human? What should it refuse outright? And how do you know it's still behaving next month? None of those are retrieval problems.
We learned this the hard way on an early support-agent build. Retrieval was excellent — the agent could cite policy chapter and verse. Then a customer asked it to *interpret* the policy for an edge case, and it obliged, confidently, wrongly. The fix wasn't a better embedding model. It was a refusal rule: interpretation questions route to a human, full stop.
What an actual strategy contains
Boundaries first: the explicit list of topics, actions, and promises the agent must decline — priced refunds, medical claims, legal interpretations, whatever your domain's third rails are. We write this before any code.
Escalation design: handoff is a feature, not a failure state. Our agents escalate within two messages of uncertainty, carrying the full conversation so the human doesn't start cold. Customers consistently rate honest escalation above confident improvisation.
Evaluation as a habit: a test suite of real scenarios — including adversarial ones — run before launch and on every change. We use structured-output schemas (Claude is particularly reliable here) so behaviour is testable, not vibes-based.
Only then, retrieval: pgvector or a managed store, kept current by pipeline rather than promise. The fourth priority, not the first slide.
When RAG-first is fine
Internal tools with expert users and low stakes — a sales team querying its own playbook — can ship retrieval-first and tune later. The cost of a wrong answer is an eye-roll, not a refund. Customer-facing agents don't get that luxury.
- Write the refusal list before the retrieval pipeline.
- Design handoff as a feature with full context transfer.
- Build an eval suite from real scenarios; run it on every change.
- Treat retrieval as plumbing — necessary, not strategic.


