Last reviewed: 2026-05-08
CometAPI fallback runbook for chat completions: incident review checklist for 2026-05-08
This draft is for teams using CometAPI in a reliability-sensitive chat-completions path. It is written as an incident review and validation checklist, not as a guarantee that any specific fallback pattern will work for every workload.
Before applying this runbook, compare your production client behavior against the current CometAPI documentation landing page and API reference. The CometAPI docs are the source of truth for supported request fields, endpoint behavior, authentication, and response formats: https://apidoc.cometapi.com/ and https://apidoc.cometapi.com/api-13851472.
For more reliability notes in this site, see the internal index at /sites/llm-api-reliability/ and the post archive at /sites/llm-api-reliability/posts/.
Key takeaways
- Treat fallback as a controlled mitigation, not a blind retry loop.
- Validate fallback behavior with small, deterministic smoke requests before routing user traffic.
- Use example thresholds, such as error-rate or latency triggers, as starting points to tune for your service rather than universal values.
- Record the exact primary model, fallback model, request parameters, error classes, timestamps, and customer impact during every incident review.
- Re-check CometAPI’s current API reference before changing production request fields or assumptions.
Concise definition
A CometAPI chat-completions fallback runbook is an operational checklist that tells engineers when to move traffic away from a degraded primary chat-completions path, how to verify a fallback path, how to monitor the mitigation, and how to document the incident afterward.
In this context, “fallback” can mean one or more of the following:
- switching from a primary model identifier to a secondary model identifier configured by your application;
- reducing optional request complexity, such as tool use or long context, if your application supports that behavior;
- retrying only safe, idempotent requests with bounded retry limits;
- routing a subset of traffic to a tested backup path while preserving observability.
Scope and assumptions
This runbook applies to application-owned reliability behavior around CometAPI chat-completions requests. It does not replace CometAPI’s own documentation or support process. Use the public API documentation and help center to confirm current platform details: https://apidoc.cometapi.com/ and https://apidoc.cometapi.com/help-center.
Assumptions to verify in your own environment:
- Your application has a primary model and at least one configured fallback model or fallback policy.
- Your client records request metadata such as model, route, latency, status code, error type, token usage if available, and retry count.
- Your fallback path has been tested outside an active incident.
- You can disable fallback quickly if it creates worse user impact.
Incident review checklist
1. Confirm the incident window
Record:
- Start time of degradation.
- Detection source: alert, customer report, synthetic test, internal dashboard, or provider status review.
- End time or current status.
- Affected route, service, tenant, region, or feature.
- Primary model or model alias used by the application.
- Whether fallback was automatic, manual, or not triggered.
Example incident note:
- Incident date: 2026-05-08
- Affected path: chat-completions generation endpoint in production assistant flow
- Primary symptom: elevated timeout rate
- Fallback action: secondary model enabled for 25% of eligible requests
- Rollback condition: fallback error rate exceeds primary error rate for two consecutive observation windows
Tune observation windows to your traffic. For a low-volume product, a fixed five-minute window may be too noisy; for a high-volume product, a shorter window may be enough.
2. Classify the failure mode
Separate symptoms before taking action. A fallback that helps timeout errors may not help malformed requests.
Common classes to review:
| Failure class | What to check | Likely action |
|---|---|---|
| Authentication or authorization errors | API key state, environment variables, secret rotation, account settings | Do not model-fallback first; fix credentials |
| 4xx request validation errors | Payload shape, unsupported parameters, model identifier, message format | Roll back client change or align with current API docs |
| 429 or rate-limit-style responses | Burst behavior, concurrency, queueing, tenant quotas | Throttle, queue, shed load, or fallback selectively |
| 5xx upstream-style errors | Provider-side or gateway-side instability | Consider bounded fallback after smoke test |
| Timeouts | Network, client timeout, long context, slow model path | Reduce timeout risk, fallback, or degrade optional features |
| Quality regressions | Prompt, model change, safety filters, output schema mismatch | Fallback only after task-specific validation |
Use the CometAPI API reference as the current source for request and response expectations when reviewing 4xx errors: https://apidoc.cometapi.com/api-13851472.
3. Decide whether fallback is allowed
Fallback should be blocked when it can create hidden correctness or compliance risk.
Fallback is usually safer when:
- the request is stateless or can be retried safely;
- the user can tolerate a slightly different model response;
- the task does not require exact formatting beyond what your validator enforces;
- the fallback model has passed recent smoke tests;
- observability can distinguish primary and fallback traffic.
Fallback should be paused or limited when:
- the request performs a high-impact action without confirmation;
- the fallback model is not approved for the feature;
- output schema conformance is required but untested;
- the fallback path lacks logs or metrics;
- token budget, latency, or cost controls are unknown.
Practical validation steps
Step 1: Verify configuration
Check that the following values are explicit and versioned:
- CometAPI base URL from current documentation or environment configuration.
- Primary model identifier.
- Fallback model identifier.
- Client timeout.
- Retry count and retry backoff.
- Maximum tokens or output budget.
- Feature flags controlling fallback.
- Request metadata fields included in logs.
Do not rely on a model alias if your incident review requires exact routing evidence. Log both the alias and the resolved model identifier when your architecture supports it.
Step 2: Run a minimal smoke request
Run a deterministic smoke test against the same client path used in production, but with sanitized content and a very small token budget. Confirm request format against the CometAPI docs before use.
Example curl-style smoke test, using placeholders:
curl -sS "${COMETAPI_BASE_URL}/v1/chat/completions" \
-H "Authorization: Bearer ${COMETAPI_API_KEY}" \
-H "Content-Type: application/json" \
-H "X-Request-Id: incident-20260508-smoke-001" \
-d '{
"model": "FALLBACK_MODEL_ID_FROM_CONFIG",
"messages": [
{
"role": "system",
"content": "Return only a compact JSON object."
},
{
"role": "user",
"content": "Health check. Reply with {\"ok\":true,\"path\":\"fallback\"}."
}
],
"temperature": 0,
"max_tokens": 32
}'
Validation criteria:
- Response returns successfully according to your client’s success criteria.
- Output can be parsed or accepted by the application.
- Latency is within your incident-specific tolerance.
- Logs show the fallback model or route clearly.
- No production secrets, user data, or regulated data appear in the smoke prompt.
Step 3: Compare primary and fallback behavior
For the same sanitized prompt, compare:
- status code or client exception;
- latency;
- response body shape;
- output parseability;
- token usage if your client records it;
- retry count;
- trace ID or request ID propagation.
Do not compare only “did it respond.” A fallback response that breaks downstream parsing can be worse than a clean failure.
Step 4: Enable fallback gradually
A sample rollout sequence:
- Enable fallback for internal synthetic checks only.
- Enable fallback for a small percentage of low-risk requests.
- Watch error rate, timeout rate, latency, and output validation failures.
- Increase only if the fallback path is better than the degraded primary path for the relevant failure class.
- Keep a rollback switch ready.
Example thresholds to tune, not universal facts:
- Consider fallback if primary timeout rate is materially above baseline for two observation windows.
- Roll back fallback if fallback schema-validation failures exceed the primary path.
- Stop increasing traffic if fallback p95 latency exceeds your user-facing timeout budget.
Step 5: Confirm user-impact reduction
A mitigation is not complete until user impact drops.
Check:
- customer-visible error rate;
- queue depth;
- timeout count;
- successful completion count;
- support-ticket volume;
- user retry behavior;
- feature-specific business metric, such as completed draft, answered ticket, or generated summary.
If customer-visible errors remain high, the fallback may not be addressing the real bottleneck.
Post-incident review questions
Use these prompts during the incident review.
Detection
- Which alert fired first?
- Was the alert tied to user impact or only infrastructure symptoms?
- Did synthetic traffic detect the issue before users did?
- Did dashboards separate primary and fallback traffic?
Diagnosis
- What was the first known bad request?
- Which error classes increased?
- Were request validation errors mixed with upstream failures?
- Did any deployment, prompt change, model change, secret rotation, or network change happen near the start time?
Mitigation
- Who approved fallback activation?
- Was fallback automatic or manual?
- What percentage of traffic moved?
- What exact configuration changed?
- Was there a rollback plan before activation?
- Did fallback reduce user impact?
Recovery
- What signal showed that primary service had recovered?
- Was traffic restored gradually?
- Were caches, queues, or retries drained safely?
- Did any users receive duplicate responses or delayed responses?
Prevention
- What test would have caught this earlier?
- Which dashboard panel was missing?
- Should fallback rules become more specific?
- Should a prompt, schema, or timeout setting be changed?
- Should the escalation path reference the CometAPI help center for support workflows: https://apidoc.cometapi.com/help-center?
Logging fields to preserve
At minimum, preserve these fields for incident analysis:
- request timestamp;
- request ID or trace ID;
- user or tenant identifier, anonymized where required;
- route or feature name;
- primary model configured;
- actual model requested;
- fallback model requested;
- fallback reason;
- retry count;
- client timeout setting;
- status code or exception class;
- latency;
- response parse result;
- output validation result;
- token counts if available from your integration;
- feature flag state;
- deployment version.
Avoid storing raw user prompts unless your privacy, retention, and security policies explicitly allow it. Prefer prompt hashes, redacted excerpts, or structured classifications for incident review.
Safe fallback patterns
Bounded retry before fallback
Use when transient network errors are common and the operation is safe to retry.
Rules:
- retry only on selected error classes;
- cap retry count;
- use jittered backoff;
- do not retry requests that may cause duplicated external side effects;
- attach a stable request ID for observability.
Fallback after validation failure
Use when the primary model responds but fails your application’s output validator.
Rules:
- retry or fallback only if the validator failure is recoverable;
- include a shorter corrective instruction if appropriate;
- cap total attempts across primary and fallback;
- log the validator error category.
Degraded prompt mode
Use when long context or optional tools increase timeout risk.
Rules:
- shorten context;
- disable optional retrieval or tools only if the user experience remains acceptable;
- tell the user when a reduced mode affects the result, if your product policy requires disclosure.
Manual operator-controlled fallback
Use when the blast radius is high.
Rules:
- require incident commander approval;
- record the change ticket;
- set an expiry time;
- monitor every traffic increase;
- roll back when the primary path is stable.
Rollback checklist
Rollback fallback when any of these conditions occur:
- fallback error rate is worse than the primary path for the active failure class;
- fallback responses fail schema or safety validation;
- fallback latency breaches the user-facing timeout budget;
- downstream systems reject fallback output;
- the incident commander cannot observe fallback separately;
- the primary path has recovered and passed a controlled ramp-up.
Rollback steps:
- Freeze additional fallback ramp-up.
- Restore primary routing for a small traffic slice.
- Compare primary and fallback metrics.
- Increase primary traffic gradually.
- Keep fallback warm until the post-recovery observation window completes.
- Close the feature flag or temporary override after review.
Documentation artifacts to attach
Attach these items to the incident ticket:
- timeline of detection, mitigation, and recovery;
- graph screenshots or dashboard links;
- list of affected services and tenants;
- CometAPI request configuration at time of incident;
- primary and fallback model IDs from config;
- sample sanitized request and response;
- error-class histogram;
- support contacts or help-center references used;
- final customer-impact statement;
- follow-up owners and due dates.
FAQ
Is fallback always the right response to a chat-completions outage?
No. Fallback helps only when the secondary path addresses the actual failure mode and has acceptable behavior for the feature. Authentication errors, malformed requests, and bad client deployments usually require a fix or rollback rather than model fallback.
Should every failed request be retried?
No. Retries should be bounded and limited to safe error classes. Unbounded retries can increase load, delay recovery, and create duplicate side effects in workflows that call external systems.
What should we test before enabling fallback in production?
At minimum, test authentication, request shape, model configuration, latency, response parsing, output validation, logging, and rollback. Use a sanitized smoke prompt and confirm the request fields against the current CometAPI API reference.
How should we choose fallback thresholds?
Start with your own baseline metrics. Example thresholds, such as “two consecutive windows above normal timeout rate,” are only starting points. Tune them by traffic volume, user-impact tolerance, and the cost of false positives.
Should fallback use the same prompt?
Usually start with the same prompt for comparison, then adjust only if the fallback model requires a different format or your output validator shows systematic failures. Keep any fallback-specific prompt changes versioned and observable.
Where should this runbook live?
Keep the operational copy near your on-call documentation, and link it from your reliability index. For this site’s editorial and publishing context, see /sites/llm-api-reliability/editorial/.
Sources checked
- https://apidoc.cometapi.com/ — Accessed 2026-05-08. Purpose: confirm the public CometAPI documentation entry point to use as the source of truth for current platform and API guidance.
- https://apidoc.cometapi.com/api-13851472 — Accessed 2026-05-08. Purpose: confirm the relevant API reference page to check current chat-completions request and response details before implementing or validating fallback behavior.
- https://apidoc.cometapi.com/help-center — Accessed 2026-05-08. Purpose: identify the CometAPI help-center location for support and operational follow-up references during incident review.