AI & ML

What We Learned Building 15 AI Chatbots for Production

The hallucination handling strategies, latency management approaches, context window compression, and UX failures that nobody else is writing about.

admin · March 31, 2026 · 2 min read

The Gap Between Demo and Production

Every AI chatbot demo looks impressive. The gap between “this demo is impressive” and “our customers actually trust this in their daily workflow” is where most AI projects fail — and it is not a model capability problem, it is a product and engineering problem.

Problem 1: Hallucination in High-Stakes Contexts

Three-layer approach: constrained retrieval (the model answers only from retrieved documents, never from training knowledge), citation enforcement (every factual claim requires a source reference from the retrieval context), and confidence thresholding (low-confidence responses return “here is what I found, please verify with an expert” instead of a confident hallucination).

Problem 2: Context Window Management

After 8 conversation turns, a rolling summary call compresses the conversation history into a compact representation. New turns append with this summary as context rather than the full raw history. Users do not notice. Token costs drop significantly. Response quality is maintained.

Problem 3: Perceived Latency

Streaming (Server-Sent Events) shows the first tokens immediately, solving perceived latency. For RAG systems, the retrieval step adds 200–500ms before the first token appears. A “thinking” animation that starts the instant the user submits masks this retrieval latency completely.

Problem 4: Knowing When Not to Build a Chatbot

Chatbots are good at open-ended questions with variable answers and synthesis tasks across multiple sources. They are poor at precise structured data retrieval (use a search box), transactional operations (use a form), and any task where consistency matters more than flexibility. Knowing when not to build one saves more money than optimising one.