Every CV in 2026 claims RAG experience. Most of those candidates have written a demo that bolts Pinecone onto GPT-4 in 40 lines. A production RAG system has a chunking strategy, a re-ranker, an eval harness, a hallucination guardrail, a cost-per-query budget, and a monitoring story. The two are not the same hire, and getting this screen wrong is expensive because RAG bugs surface in production long after launch.
TL;DR: Screen for eval-harness experience first, retrieval-quality intuition second, and framework fluency a distant third. Anyone whose answer to "how do you know your RAG is accurate?" is "we tested it manually" is not the hire for production.
The five capability bands
RAG engineers fall into roughly five bands. You almost certainly want a Band 3 or 4, not a Band 5 and not a Band 1–2.
- Band 1 — Demo-builder. Has built a 1-hour RAG demo. Cannot reason about production.
- Band 2 — Framework user. Fluent in LangChain or LlamaIndex. Has shipped an internal tool. Cannot debug quality regressions.
- Band 3 — Production generalist. Has shipped a RAG system to external users with an eval harness, monitoring, and cost controls. This is the sweet spot.
- Band 4 — RAG specialist. Has built custom retrievers, fine-tuned re-rankers, and solved non-trivial chunking problems. Worth a premium.
- Band 5 — Research-lean. Publishes papers. Often expensive, sometimes over-engineered for product work.
Trap interview questions
These five questions reliably separate Band 3+ from Band 1–2:
- "Describe your chunking strategy for a 1,000-page legal PDF corpus. What went wrong in your first attempt?" — Every real RAG engineer has a chunking war story. Absence of one is a signal.
- "Walk me through how you evaluate retrieval quality. What's your ground-truth set?" — Vague answers are disqualifying; RAG without eval is faith-based engineering.
- "You're retrieving 10 chunks. How many make it to the LLM context, and why?" — Tests understanding of re-ranking, context window economics, and prompt budget.
- "A user reports the system hallucinated. Walk me through how you'd root-cause in production." — Tests logging, tracing (e.g., LangSmith), and post-hoc debugging discipline.
- "What's your cost-per-query target, and how do you enforce it?" — Production RAG costs are real. Engineers who haven't watched OpenAI invoices are not production-ready.
Must-have experience on a CV
- At least one RAG system in production with external users (not internal demo).
- Worked with at least two vector databases (Pinecone, Weaviate, Qdrant, pgvector, Supabase Vector — any mix).
- Built or maintained an eval harness — automated retrieval-quality metric, recall@k at minimum.
- Experience with at least one re-ranker (Cohere Rerank, cross-encoder, custom).
- Familiarity with tracing — LangSmith, Langfuse, Arize, or a homegrown equivalent.
Nice-to-have (Band 4 territory)
- Fine-tuned an embedding model for domain.
- Custom chunking for a non-trivial format — tables, code, diagrams.
- Experience with hybrid search (BM25 + vector) and knows when each wins.
- Prompt caching, semantic caching, or structured-output (JSON mode) in production.
- Worked on a multi-tenant RAG system with per-tenant retrieval isolation.
Red flags
- "We use GPT-4 for everything, so it's fine." — Cost naivety.
- No answer to "how do you test retrieval quality?" other than "manually".
- Has only used one framework (LangChain) and cannot describe when they'd go without it.
- No mention of chunking strategy — this is usually 40% of real-world RAG quality.
- "Agents will solve it." — Occasionally true; often avoidance of fundamentals.
Engagement model that fits AI work
RAG projects are notoriously hard to fixed-price because retrieval quality emerges experimentally. We recommend either hourly time-and-materials during the discovery-to-production phase (first 8–12 weeks), then switching to dedicated monthly once the system stabilises. This matches how the problem actually behaves.
FAQ
Should I hire a full-time AI engineer or a dedicated team?
For a single RAG system in production, a dedicated half-time senior RAG engineer plus a shared DevOps helper is cheaper and faster than recruiting a full-time hire in 2026. For an AI-native product, you eventually need in-house.
Is LangChain still the right framework in 2026?
For production RAG, teams are increasingly stack-picking between LangChain (maturity), LlamaIndex (retrieval-first), and minimal-framework custom code. Hire engineers who can reason about the trade-off, not zealots.
How long to get a RAG system to production?
8–14 weeks end-to-end for a single-domain system if the data is accessible; double it if you need to clean, chunk, and label a messy corpus first.
Next step
Every Krapton AI engineer has shipped at least one production RAG system and been vetted against the 5 trap questions above. Explore our AI development services, hire LangChain engineers directly, or hire OpenAI integration engineers to start.
