An LLM API fallback is useful only when the application knows which failures are safe to route around. A smoke test should prove that the fallback path works without hiding bad inputs, quota problems, or model-quality regressions.

Define the failure classes

Separate network timeouts, rate limits, server errors, malformed requests, and low-quality model output. Each failure class should have a clear action: retry, fallback, stop, or alert.

Keep the first test narrow

Send one production-shaped request through the primary path, then force a controlled fallback. Confirm the second path receives the same prompt shape, model intent, timeout budget, and logging context.

Verify the response contract

The fallback response should satisfy the fields your application consumes. Do not stop at HTTP 200; check message content, finish reason, usage fields when available, latency, and any safety or policy metadata.

Keep rollback simple

Before increasing traffic, make sure the feature flag can route back to the primary path quickly. The first reliability goal is reversibility, not clever automation.