Don't Trust an "AI Agent" Consultant Until They Can Answer These 3 Questions

01.

Why most AI agent demos fail in production

The gap between a working demo and a working production system is not a technical gap. It is a design gap — specifically, a failure design gap. Demos are built to succeed. Production systems must be built to fail well. An AI agent in production will encounter network timeouts, malformed tool responses, LLM hallucinations on edge cases, and ambiguous user inputs that no training data anticipated. A system not designed around these failure modes will either crash, produce incorrect outputs silently, or escalate every exception to a human — defeating the purpose of having an agent at all.

02.

Question 1 — How do you classify agent failures?

A production AI agent encounters multiple categories of failure: retriable errors (temporary network issues, rate limits), fatal errors (corrupted state, invalid tool schemas), reasoning failures (the agent chose the wrong path), and human escalation triggers (the agent is uncertain and should not proceed autonomously). A consultant who cannot explain how their system distinguishes between these categories — and what the recovery protocol is for each — has not shipped a production agent. They have shipped a demo that works until it doesn't.

03.

Question 2 — What is your LLM provider strategy?

Single-provider dependency is the second most common production failure mode. LLM providers change their APIs, introduce breaking changes to model behavior, experience outages, and alter their pricing structures. A production AI agent system tightly coupled to a single provider will break on a schedule you cannot predict. The correct answer involves provider abstraction — a layer that allows the system to switch between providers without changing application logic. Consultants who skip it are optimizing for demo delivery speed at the cost of your long-term operational stability.

04.

Question 3 — How does the agent communicate uncertainty?

An AI agent operating autonomously will regularly encounter situations where it does not know the right answer, does not have enough information to proceed, or is about to take an action with irreversible consequences. The system needs an explicit protocol for these moments — a calibrated communication that gives the human the right information to make a decision quickly. Getting this right requires thinking carefully about the user experience of human oversight, which is a design problem, not a technical problem. At ARCHECO, our AI agent practice is built on this principle. Get in touch to discuss your AI agent requirements.