Last reviewed: 2026-07-04
Direct answer
A fallback decision ledger is a compact operating record that explains why a team kept normal routing, retried a failed LLM API call, switched to a fallback path, or escalated to support. It is not a transcript archive and it is not a proof of provider health. Its job is narrower: connect each routing decision to the endpoint contract that was checked, the retry plan that was used, the observable result, and the owner for the next action.
For CometAPI-backed systems, the ledger should begin with the endpoint family being exercised. The current CometAPI documentation includes separate references for Chat Completions and Responses, and those references describe different request and response areas that an operator should verify before promoting fallback behavior. A team using both paths should not treat them as interchangeable during incident review. The ledger should say which family was tested, which request shape was expected, what status class was observed, whether retry behavior stayed inside the intended attempt budget, and whether the caller received a usable response or a controlled failure.
Use the ledger before changing production fallback behavior. Start with a small smoke test against the documented endpoint family, then run one controlled error-path check. The happy path shows that the normal request can still produce a syntactically usable result. The error path shows whether the client handles transient failure classes with bounded backoff instead of creating a retry loop that makes overload worse. For deeper response-contract checks, pair this article with Review Response Shapes Before LLM API Failover and Retry Budget Evidence for Safer LLM API Calls .
A useful ledger entry can be short. It should preserve the evidence needed for another engineer to understand the decision without exposing credentials, customer data, full prompts, or full generated responses. The safest entries use placeholders and classifications: endpoint family, request identifier, timestamp, status class, retry count, final routing decision, evidence links, and follow-up owner.
Concrete operator workflow:
- Setup assumptions: the operator has an approved API account, a valid credential stored outside the note, a non-production prompt, a selected endpoint family from the current documentation, and a local request ID format that can be traced across gateway logs.
- Happy-path request plan: send one minimal request to the documented Chat Completions or Responses path using a harmless test prompt. Confirm only that the client receives a syntactically usable response for the selected path.
- Error-path check: simulate or capture a retryable failure class such as throttling, timeout, or temporary unavailability. Verify that retry behavior uses backoff, includes jitter where the client supports it, and stops after the configured attempt budget.
- Minimum assertions: record endpoint family, request timestamp, status class, retry count, final decision, and whether the caller received a usable response or a controlled failure.
- Pass/fail logging fields:
ledger_id,endpoint_family,request_id,status_class,retry_attempts,fallback_decision,operator_initials, andfollow_up_owner. - What not to assert: do not claim a model is available, a price is current, a quota exists, latency is guaranteed, or a provider is healthy unless that exact fact was verified in the current documentation or account console at test time.
Example sanitized request note:
credential: <API_KEY_PLACEHOLDER>
endpoint_family: chat_completions_or_responses
request_id: req_placeholder_001
prompt_class: harmless_smoke_test
status_class: 2xx_or_retryable_or_terminal
retry_attempts: integer_placeholder
fallback_decision: keep_primary | retry_primary | use_fallback | escalate
Who this is for
This guide is for on-call engineers, platform owners, and application teams that need LLM API fallback decisions to be explainable after an incident. It is especially useful when the team has more than one endpoint family, model provider, or gateway route behind an application and wants a record that separates verified facts from assumptions.
It also helps reviewers who were not present during the incident. A clear ledger lets them see whether the team checked the right contract, stayed inside the retry budget, avoided leaking sensitive payloads, and made a routing decision that can be repeated or challenged later. Teams with existing incident templates can use this as a small section inside their broader handoff notes rather than creating another long document.
Key takeaways
- Keep the ledger focused on decisions, evidence, and follow-up owners rather than long transcripts.
- Verify endpoint paths, request fields, response fields, and provider-specific behavior against current documentation before changing routing.
- Treat retries as a bounded reliability tool, not as an unlimited recovery loop.
- Record support-ready evidence without storing credentials, full prompts, full responses, or customer data.
- Use the ledger to decide whether to keep primary routing, retry the primary route, move to fallback, or escalate.
- Link the ledger to related evidence, such as Build an On-call Evidence Packet for LLM API Incidents when the incident needs a broader handoff.
Sources checked
- CometAPI chat completions reference - accessed 2026-07-04; purpose: verify chat completion contract areas.
- CometAPI help center - accessed 2026-07-04; purpose: verify support and escalation documentation areas.
- CometAPI documentation - accessed 2026-07-04; purpose: verify current CometAPI documentation navigation.
- CometAPI responses reference - accessed 2026-07-04; purpose: verify responses endpoint contract areas.
- AWS retry with backoff pattern - accessed 2026-07-04; purpose: verify retry and backoff guidance.
Contract details to verify
| Area | What to verify | Source URL | Accessed | Safe candidate wording |
|---|---|---|---|---|
| Chat request path | Confirm the current Chat Completions path and required request fields before running a smoke test. | https://apidoc.cometapi.com/api/text/chat | 2026-07-04 | “Use the current Chat Completions documentation to verify the endpoint family and request shape before testing.” |
| Responses request path | Confirm whether the Responses endpoint is the right path for the selected workload. | https://apidoc.cometapi.com/api/text/responses | 2026-07-04 | “Use the current Responses documentation when the workload depends on the features described for that endpoint family.” |
| Retry behavior | Confirm that retry handling is limited to transient failure classes and uses a bounded backoff plan. | https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/retry-backoff.html | 2026-07-04 | “Retry only when the failure looks transient, and stop after the configured attempt budget.” |
| Support packet | Confirm what operational evidence can help support investigate without exposing secrets or sensitive content. | https://apidoc.cometapi.com/support/help-center | 2026-07-04 | “Escalate with request metadata, status class, timing, and account-safe context, not credentials or private payloads.” |
| Documentation navigation | Confirm that the operator is using the current public documentation entry point. | https://apidoc.cometapi.com/ | 2026-07-04 | “Start from the current documentation home when a bookmarked endpoint reference looks stale.” |
Failure modes
Evidence gap: the team cannot inspect the failing log, source page, request trace, or local command output. The safe action is to stop and record the missing evidence instead of guessing. A ledger entry should be allowed to say that a decision was paused because the key fact was unavailable.
Scope drift: the repair expands beyond the observed failure. A routing issue should not become an unplanned model migration, permission change, prompt rewrite, or retry-policy redesign. Keep the action tied to the failing signal and leave broader cleanup for a separate review.
Environment mismatch: the local check uses different versions, credentials, feature flags, provider settings, or runtime configuration than the hosted path. Record the mismatch before treating the result as proof. A test that passes in a different environment may still be useful, but it should not be used as production evidence without qualification.
Retry amplification: every caller retries at the same time, or retries continue after the attempt budget is exhausted. Backoff is meant to reduce pressure during transient failures. If the ledger shows repeated retries without a stopping rule, the next decision should be to reduce retry pressure rather than add more fallback traffic.
Weak handoff: the final incident note says the issue is fixed but omits the request family, status class, retry result, changed route, and remaining uncertainty. That makes the next operator repeat the investigation. The ledger should leave enough structure for the next reviewer to reproduce the reasoning, not necessarily the whole incident.
Reader next step
Create one ledger entry for the next planned fallback test before changing any routing rule. Use a harmless prompt, choose either the Chat Completions or Responses documentation as the contract for that test, and write down the expected request shape before sending the request. After the test, fill in the status class, retry count, final routing decision, and follow-up owner.
If the happy path passes but the error-path check is unclear, do not promote fallback traffic yet. Add the missing evidence to the follow-up field and run a smaller retry-budget review. If the response shape is the uncertain part, use How to Use Response Contract Evidence to Harden LLM API Failover as the next supporting check. If the uncertainty is about escalation context, use Build a CometAPI Support Packet for Incident Handoffs to prepare a safer support note.
Use CometAPI chat reliability contract review as the next comparison point. Keep Build a CometAPI Fallback Evidence Checklist nearby for setup and permission checks.
FAQ
What is the smallest useful decision ledger entry?
Record the endpoint family, request ID, status class, retry count, fallback decision, evidence links, and follow-up owner. That is enough to reconstruct why the operator kept routing, retried, switched, or escalated.
Should the ledger include full prompts and responses?
No. Use sanitized placeholders or short classifications. Full prompts, full responses, credentials, and customer data create unnecessary exposure and rarely help a routing decision.
When should a retry become an escalation?
Escalate when the error is not transient, when the retry budget is exhausted, when the response shape is not usable by the caller, or when account-specific support context is required.
Can this ledger prove uptime or model availability?
No. It can prove what the operator checked and decided for a specific test or incident. Uptime, model availability, prices, quotas, and account limits must be verified in the current official source or account console at the time of use.
Does this replace a full incident report?
No. Treat it as the decision record inside the incident report. It captures the routing choice and the evidence behind it, while the broader incident report can cover customer impact, timeline, ownership, and prevention work.