Chat completions fallback runbook for CometAPI

Last reviewed: 2026-05-09

Who this is for: on-call engineers, platform owners, and application operators who run user-facing chat completion workloads through CometAPI and need a practical fallback decision process.

This runbook treats fallback as an operator-owned reliability control. It does not assume that every error should be routed away. Some failures should trigger a retry, some should page the owning team, and some should be stopped because fallback would hide a bad request, expired credential, or unsupported contract assumption.

For broader reliability patterns, keep this page connected to the site index at /sites/llm-api-reliability/ and related operational drafts under /sites/llm-api-reliability/posts/.

Key takeaways

  • Use fallback only after classifying the failure signal: transport, provider response, rate-limit/quota, output contract, latency, or client configuration.
  • Do not fallback on deterministic client errors such as malformed requests, invalid model identifiers, missing authentication, or schema bugs.
  • Keep the CometAPI endpoint path, auth format, request fields, response fields, and error shape tied to the current public documentation, especially the CometAPI API docs and chat completions reference.
  • Tune thresholds to your own SLO, traffic shape, and cost controls. The numeric examples below are starting points for drills, not universal rules.
  • Validate fallback with injected failures before relying on it during an incident.

Concise definition

A CometAPI chat completions fallback is an application-side policy that decides what to do when a primary chat completion request path is unhealthy.

In this runbook, fallback can mean:

  • retrying the same request after a short backoff;
  • switching to a secondary configured route;
  • returning a degraded but explicit response;
  • queueing the request for later processing;
  • stopping the request and paging an operator.

It does not mean blindly replaying every failed request through another model or provider.

Evidence basis

The public CometAPI documentation landing page is the first source to check for current API navigation and contract references: https://apidoc.cometapi.com/. The chat completions endpoint reference should be treated as the source of truth for the request and response contract: https://apidoc.cometapi.com/api-13851472. For account, support, and operational questions that are not defined in the endpoint reference, check the CometAPI help center: https://apidoc.cometapi.com/help-center.

Contract details to verify

Before enabling automated fallback, record the exact contract your production client depends on. This avoids a common failure mode: the fallback layer works, but it is protecting an integration that was never aligned with the documented API.

Contract areaWhat to verify before productionOperational noteSource to check
Endpoint pathsConfirm the base URL and exact chat completions path used by your client. If your code assumes an OpenAI-compatible path such as /v1/chat/completions, verify that assumption against the endpoint reference.Log the resolved URL path without query secrets. Alert on unexpected path drift between environments.CometAPI docs landing page and chat completions reference: docs, endpoint reference
Auth headersConfirm the required authentication header format, token location, and whether any organization, project, or account-scoping header is required.Do not fallback on 401 or 403 until credential rotation, account status, and environment variables are checked.CometAPI docs and help center: docs, help center
Request fieldsConfirm required and optional fields such as model identifier, messages array, streaming flag, sampling parameters, and token limits.Treat request validation errors as client bugs unless the endpoint reference documents otherwise.Chat completions reference: endpoint reference
Response fieldsConfirm the fields your application parses, such as assistant message content, finish reason, usage information, request identifier, and streaming chunks if streaming is enabled.A response can be HTTP-successful and still fail your application contract. Track parse failures separately from transport failures.Chat completions reference: endpoint reference
Error behaviorConfirm documented status codes, error object shape, retryable conditions, and whether rate-limit responses include retry timing.If error semantics are not explicit, validate with controlled tests and keep conservative retry rules.Endpoint reference and help center: endpoint reference, help center
Rate-limit or billing assumptionsConfirm whether limits, quota, billing state, or usage exhaustion can affect chat completion calls, and where those events are surfaced.Do not assume current pricing, availability, or quota behavior from memory. Use current account-facing documentation or dashboard evidence.Help center and account documentation: help center
Fallback compatibilityConfirm that the secondary route accepts the same message format, safety policy, max-token behavior, streaming behavior, and response parser expectations.A fallback target is not safe just because it accepts a similar JSON body. Run contract tests against each configured target.Your internal contract tests plus CometAPI endpoint reference

Monitoring signals checklist

Use this checklist to classify the signal before choosing a fallback action.

SignalWhat to measureExample trigger to tuneRecommended actionValidation step
Connection timeoutClient-side connect timeout, TLS timeout, DNS failure, socket timeoutMore than 2 consecutive failures for the same route, or elevated failure rate over 5 minutesRetry once with jitter. If still failing and user SLO is at risk, route to approved fallback.Inject connection refusal or blackhole the upstream in staging. Confirm only retryable calls are replayed.
HTTP 5xxCount and rate of server-side errors returned by the API path5xx rate above your normal baseline for 3 to 5 minutesRetry idempotent-safe requests with exponential backoff. Escalate to fallback if burn rate remains high.Stub 500 and 503 responses. Confirm alert includes status, route, model identifier, and request class.
HTTP 429 or quota-like responseRate-limit responses, quota exhaustion, retry-after headers if documentedAny sustained 429 on production trafficRespect documented retry guidance if present. Fallback only if alternate capacity and billing controls are approved.Run a low-volume controlled test. Confirm your client does not stampede the alternate route.
HTTP 400 class validation errorBad request, invalid field, unsupported parameter, context-too-large, malformed messagesAny repeated 400 on the same release or request builder versionDo not fallback by default. Roll back or fix the client request.Send a deliberately invalid request in staging. Confirm it opens an integration alert, not a fallback event.
HTTP 401 or 403Authentication or authorization failureAny production occurrence unless expected during rotationDo not fallback until credential and account status are verified. Page platform owner.Rotate a staging secret to an invalid value. Confirm no alternate provider receives the request.
Latency SLO breachp50, p95, p99, timeout ratio, queue wait, time to first token for streamingp95 above your user-facing budget for a sustained windowPrefer graceful degradation before full fallback if responses are still correct.Inject upstream delay and verify timeout boundaries, user messaging, and cancellation behavior.
Empty or malformed assistant outputEmpty content, invalid JSON where JSON is required, missing expected fieldsParser failure rate above baselineRetry once if transient. Fallback only if the secondary route is contract-compatible.Force malformed output in a test harness. Confirm the fallback path revalidates the response.
Finish reason or token-budget exhaustionResponses cut off due to length or equivalent finish stateMore than baseline truncation for a request classAdjust prompt or token budget. Do not treat as provider outage unless correlated with other failures.Run long-context test prompts. Confirm the alert names the request class and token budget.
Streaming interruptionMissing final event, stalled stream, chunk parse error, client disconnectStall beyond your stream idle timeoutCancel and retry non-side-effecting requests if safe. For interactive users, show a clear partial-response state.Simulate dropped SSE or chunked transfer. Confirm partial content is labeled, not silently accepted.
Cost or usage anomalySudden token usage increase, retry amplification, fallback volume spikeRetry/fallback volume above planned budgetCircuit-break fallback and page operator if amplification is detected.Run a drill with forced failures. Confirm dashboards show primary calls, retries, and fallback calls separately.

Decision rules for on-call use

Use these rules during an incident:

  1. Classify the failure first. Determine whether the dominant signal is transport, 5xx, 429, 4xx validation, auth, latency, output contract, or cost amplification.
  2. Retry only when the request is safe to replay. If the request can trigger external side effects in your application, deduplicate before retrying.
  3. Fallback only when the alternate route has been pre-approved. It must have compatible request fields, response parsing, safety behavior, data handling, and budget controls.
  4. Stop on auth and malformed request failures. Fallback on 401, 403, or repeated deterministic 400 errors usually hides the real issue.
  5. Log the decision, not just the error. Each event should say whether the system retried, fell back, degraded, queued, or failed closed.

Minimal sanitized request example

Use this as a shape check, not as a contract guarantee. Confirm the path, required fields, and supported parameters in the CometAPI chat completions reference before production.

curl -X POST "https://<COMETAPI_BASE_URL>/v1/chat/completions" \
  -H "Authorization: Bearer ${COMETAPI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-id-approved-for-this-workload>",
    "messages": [
      {
        "role": "system",
        "content": "Answer with concise operational guidance."
      },
      {
        "role": "user",
        "content": "Summarize the current incident state from these sanitized symptoms: elevated 5xx and p95 latency."
      }
    ],
    "temperature": 0.2,
    "max_tokens": 300,
    "stream": false
  }'

For production, attach your own request correlation identifier in application logs. Do not assume an extra API header is accepted unless the endpoint documentation explicitly supports it.

Practical validation steps

1. Build a contract fixture

Create a fixture with one approved request per workload class:

  • normal short chat;
  • long-context request near your expected token budget;
  • structured-output request if your application requires JSON;
  • streaming request if you use streaming;
  • blocked or rejected request if your application has policy gates.

For each fixture, store:

  • request body shape;
  • expected required response fields;
  • parser expectations;
  • timeout budget;
  • retry eligibility;
  • fallback eligibility.

2. Run failure injection in staging

Inject one failure mode at a time:

  • DNS or connection failure;
  • upstream timeout;
  • HTTP 500;
  • HTTP 429;
  • HTTP 400 malformed request;
  • HTTP 401 invalid credential;
  • malformed successful response;
  • delayed streaming chunks.

Expected outcome:

  • retryable transport and 5xx failures may retry and then fallback;
  • 400, 401, and 403 failures should not fallback automatically;
  • malformed success responses should be counted as contract failures;
  • cost and retry amplification should be visible in dashboards.

3. Validate observability fields

Every chat completion attempt should emit structured telemetry. At minimum, capture:

  • route name, not secret URL;
  • endpoint path;
  • workload class;
  • model identifier used by your configuration;
  • HTTP status code;
  • timeout phase if available;
  • retry attempt number;
  • fallback decision;
  • latency and time to first token if streaming;
  • response parser result;
  • token usage if provided by the response and documented for the endpoint;
  • sanitized error category.

Avoid logging prompts, secrets, full user data, or raw authorization headers.

4. Drill the on-call decision

Run a 30-minute drill with the application owner and on-call engineer:

  1. Force 5xx for a small staging traffic slice.
  2. Confirm retry volume stays inside the configured cap.
  3. Confirm fallback begins only after the configured trigger.
  4. Confirm fallback responses pass the same parser.
  5. Confirm dashboards separate primary, retry, and fallback traffic.
  6. Confirm the runbook tells the operator when to disable fallback.

Record the result in your internal incident-readiness notes. If this site later publishes a validation template, link it from /sites/llm-api-reliability/editorial/.

Suggested alert routing

AlertPage immediately?OwnerNotes
Sustained 5xx plus user-visible failuresYesOn-call application/platform engineerFallback may be appropriate if pre-approved.
Sustained timeout rateYesPlatform engineerCheck network, DNS, upstream latency, and client timeout changes.
429 or quota-like failuresUsuallyPlatform or account ownerFallback requires budget and capacity approval.
400 validation spike after deployYes, if production-impactingReleasing teamRoll back or patch request builder.
401 or 403YesSecret/account ownerDo not auto-fallback until credentials and account status are verified.
Parser failure spikeYes, if user-facingApplication ownerRevalidate response contract and structured-output assumptions.
Fallback volume spikeYesReliability ownerDetect retry storms and cost amplification.

FAQ

Should every failed chat completion request fallback automatically?

No. Automatic fallback is safest for transient transport failures, selected 5xx responses, and latency events where the alternate route is already validated. It is usually unsafe for malformed requests, auth failures, unsupported parameters, or schema bugs.

Is a retry the same as fallback?

No. A retry repeats the request against the same route, usually after a short delay. Fallback changes behavior: a secondary route, degraded response, queue, or fail-closed path. Track them separately.

What is the most important metric?

There is no single metric. Combine status code rate, timeout rate, latency, parser failure rate, fallback volume, and user-visible failure rate. A low HTTP error rate can still hide malformed successful responses.

Should 429 always trigger fallback?

Not always. A 429-style signal may indicate rate limiting, quota, or account-level constraints. Fallback can be appropriate only if the secondary route has approved capacity, compatible behavior, and cost controls.

Can fallback hide product bugs?

Yes. If the client sends an invalid request or parses the response incorrectly, fallback may make the incident harder to diagnose. That is why this runbook separates deterministic 4xx and parser failures from retryable transport failures.

How often should this runbook be reviewed?

Review it whenever the CometAPI endpoint documentation changes, when you add a new model route, after any production incident, and at least once per quarter for high-traffic workloads.

Sources checked

SourceAccess datePurpose
CometAPI API documentation2026-05-09Checked as the public documentation entry point for API references and integration navigation.
CometAPI chat completions endpoint reference2026-05-09Checked as the evidence source for chat completion request and response contract verification.
CometAPI help center2026-05-09Checked for account, support, billing, and operational guidance that may affect fallback decisions.