Chat completions fallback runbook for CometAPI

Last reviewed: 2026-05-09

Who this is for: on-call engineers, platform owners, and application operators who run user-facing chat completion workloads through CometAPI and need a practical fallback decision process.

This runbook treats fallback as an operator-owned reliability control. It does not assume that every error should be routed away. Some failures should trigger a retry, some should page the owning team, and some should be stopped because fallback would hide a bad request, expired credential, or unsupported contract assumption.

For broader reliability patterns, keep this page connected to the site index at /sites/llm-api-reliability/ and related operational drafts under /sites/llm-api-reliability/posts/ .

Key takeaways

Use fallback only after classifying the failure signal: transport, provider response, rate-limit/quota, output contract, latency, or client configuration.
Do not fallback on deterministic client errors such as malformed requests, invalid model identifiers, missing authentication, or schema bugs.
Keep the CometAPI endpoint path, auth format, request fields, response fields, and error shape tied to the current public documentation, especially the CometAPI API docs and chat completions reference.
Tune thresholds to your own SLO, traffic shape, and cost controls. The numeric examples below are starting points for drills, not universal rules.
Validate fallback with injected failures before relying on it during an incident.

Concise definition

A CometAPI chat completions fallback is an application-side policy that decides what to do when a primary chat completion request path is unhealthy.

In this runbook, fallback can mean:

retrying the same request after a short backoff;
switching to a secondary configured route;
returning a degraded but explicit response;
queueing the request for later processing;
stopping the request and paging an operator.

It does not mean blindly replaying every failed request through another model or provider.

Evidence basis

The public CometAPI documentation landing page is the first source to check for current API navigation and contract references: https://apidoc.cometapi.com/ . The chat completions endpoint reference should be treated as the source of truth for the request and response contract: https://apidoc.cometapi.com/api-13851472 . For account, support, and operational questions that are not defined in the endpoint reference, check the CometAPI help center: https://apidoc.cometapi.com/help-center .

Contract details to verify

Before enabling automated fallback, record the exact contract your production client depends on. This avoids a common failure mode: the fallback layer works, but it is protecting an integration that was never aligned with the documented API.

Contract area	What to verify before production	Operational note	Source to check
Endpoint paths	Confirm the base URL and exact chat completions path used by your client. If your code assumes an OpenAI-compatible path such as `/v1/chat/completions`, verify that assumption against the endpoint reference.	Log the resolved URL path without query secrets. Alert on unexpected path drift between environments.	CometAPI docs landing page and chat completions reference: docs , endpoint reference
Auth headers	Confirm the required authentication header format, token location, and whether any organization, project, or account-scoping header is required.	Do not fallback on `401` or `403` until credential rotation, account status, and environment variables are checked.	CometAPI docs and help center: docs , help center
Request fields	Confirm required and optional fields such as model identifier, messages array, streaming flag, sampling parameters, and token limits.	Treat request validation errors as client bugs unless the endpoint reference documents otherwise.	Chat completions reference: endpoint reference
Response fields	Confirm the fields your application parses, such as assistant message content, finish reason, usage information, request identifier, and streaming chunks if streaming is enabled.	A response can be HTTP-successful and still fail your application contract. Track parse failures separately from transport failures.	Chat completions reference: endpoint reference
Error behavior	Confirm documented status codes, error object shape, retryable conditions, and whether rate-limit responses include retry timing.	If error semantics are not explicit, validate with controlled tests and keep conservative retry rules.	Endpoint reference and help center: endpoint reference , help center
Rate-limit or billing assumptions	Confirm whether limits, quota, billing state, or usage exhaustion can affect chat completion calls, and where those events are surfaced.	Do not assume current pricing, availability, or quota behavior from memory. Use current account-facing documentation or dashboard evidence.	Help center and account documentation: help center
Fallback compatibility	Confirm that the secondary route accepts the same message format, safety policy, max-token behavior, streaming behavior, and response parser expectations.	A fallback target is not safe just because it accepts a similar JSON body. Run contract tests against each configured target.	Your internal contract tests plus CometAPI endpoint reference

Monitoring signals checklist

Use this checklist to classify the signal before choosing a fallback action.

Signal	What to measure	Example trigger to tune	Recommended action	Validation step
Connection timeout	Client-side connect timeout, TLS timeout, DNS failure, socket timeout	More than 2 consecutive failures for the same route, or elevated failure rate over 5 minutes	Retry once with jitter. If still failing and user SLO is at risk, route to approved fallback.	Inject connection refusal or blackhole the upstream in staging. Confirm only retryable calls are replayed.
HTTP 5xx	Count and rate of server-side errors returned by the API path	5xx rate above your normal baseline for 3 to 5 minutes	Retry idempotent-safe requests with exponential backoff. Escalate to fallback if burn rate remains high.	Stub 500 and 503 responses. Confirm alert includes status, route, model identifier, and request class.
HTTP 429 or quota-like response	Rate-limit responses, quota exhaustion, retry-after headers if documented	Any sustained 429 on production traffic	Respect documented retry guidance if present. Fallback only if alternate capacity and billing controls are approved.	Run a low-volume controlled test. Confirm your client does not stampede the alternate route.
HTTP 400 class validation error	Bad request, invalid field, unsupported parameter, context-too-large, malformed messages	Any repeated 400 on the same release or request builder version	Do not fallback by default. Roll back or fix the client request.	Send a deliberately invalid request in staging. Confirm it opens an integration alert, not a fallback event.
HTTP 401 or 403	Authentication or authorization failure	Any production occurrence unless expected during rotation	Do not fallback until credential and account status are verified. Page platform owner.	Rotate a staging secret to an invalid value. Confirm no alternate provider receives the request.
Latency SLO breach	p50, p95, p99, timeout ratio, queue wait, time to first token for streaming	p95 above your user-facing budget for a sustained window	Prefer graceful degradation before full fallback if responses are still correct.	Inject upstream delay and verify timeout boundaries, user messaging, and cancellation behavior.
Empty or malformed assistant output	Empty content, invalid JSON where JSON is required, missing expected fields	Parser failure rate above baseline	Retry once if transient. Fallback only if the secondary route is contract-compatible.	Force malformed output in a test harness. Confirm the fallback path revalidates the response.
Finish reason or token-budget exhaustion	Responses cut off due to length or equivalent finish state	More than baseline truncation for a request class	Adjust prompt or token budget. Do not treat as provider outage unless correlated with other failures.	Run long-context test prompts. Confirm the alert names the request class and token budget.
Streaming interruption	Missing final event, stalled stream, chunk parse error, client disconnect	Stall beyond your stream idle timeout	Cancel and retry non-side-effecting requests if safe. For interactive users, show a clear partial-response state.	Simulate dropped SSE or chunked transfer. Confirm partial content is labeled, not silently accepted.
Cost or usage anomaly	Sudden token usage increase, retry amplification, fallback volume spike	Retry/fallback volume above planned budget	Circuit-break fallback and page operator if amplification is detected.	Run a drill with forced failures. Confirm dashboards show primary calls, retries, and fallback calls separately.

Decision rules for on-call use

Use these rules during an incident:

Classify the failure first. Determine whether the dominant signal is transport, 5xx, 429, 4xx validation, auth, latency, output contract, or cost amplification.
Retry only when the request is safe to replay. If the request can trigger external side effects in your application, deduplicate before retrying.
Fallback only when the alternate route has been pre-approved. It must have compatible request fields, response parsing, safety behavior, data handling, and budget controls.
Stop on auth and malformed request failures. Fallback on 401, 403, or repeated deterministic 400 errors usually hides the real issue.
Log the decision, not just the error. Each event should say whether the system retried, fell back, degraded, queued, or failed closed.

Minimal sanitized request example

Use this as a shape check, not as a contract guarantee. Confirm the path, required fields, and supported parameters in the CometAPI chat completions reference before production.

curl -X POST "https://<COMETAPI_BASE_URL>/v1/chat/completions" \
  -H "Authorization: Bearer ${COMETAPI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-id-approved-for-this-workload>",
    "messages": [
      {
        "role": "system",
        "content": "Answer with concise operational guidance."
      },
      {
        "role": "user",
        "content": "Summarize the current incident state from these sanitized symptoms: elevated 5xx and p95 latency."
      }
    ],
    "temperature": 0.2,
    "max_tokens": 300,
    "stream": false
  }'

For production, attach your own request correlation identifier in application logs. Do not assume an extra API header is accepted unless the endpoint documentation explicitly supports it.

Practical validation steps

1. Build a contract fixture

Create a fixture with one approved request per workload class:

normal short chat;
long-context request near your expected token budget;
structured-output request if your application requires JSON;
streaming request if you use streaming;
blocked or rejected request if your application has policy gates.

For each fixture, store:

request body shape;
expected required response fields;
parser expectations;
timeout budget;
retry eligibility;
fallback eligibility.

2. Run failure injection in staging

Inject one failure mode at a time:

DNS or connection failure;
upstream timeout;
HTTP 500;
HTTP 429;
HTTP 400 malformed request;
HTTP 401 invalid credential;
malformed successful response;
delayed streaming chunks.

Expected outcome:

retryable transport and 5xx failures may retry and then fallback;
400, 401, and 403 failures should not fallback automatically;
malformed success responses should be counted as contract failures;
cost and retry amplification should be visible in dashboards.

3. Validate observability fields

Every chat completion attempt should emit structured telemetry. At minimum, capture:

route name, not secret URL;
endpoint path;
workload class;
model identifier used by your configuration;
HTTP status code;
timeout phase if available;
retry attempt number;
fallback decision;
latency and time to first token if streaming;
response parser result;
token usage if provided by the response and documented for the endpoint;
sanitized error category.

Avoid logging prompts, secrets, full user data, or raw authorization headers.

4. Drill the on-call decision

Run a 30-minute drill with the application owner and on-call engineer:

Force 5xx for a small staging traffic slice.
Confirm retry volume stays inside the configured cap.
Confirm fallback begins only after the configured trigger.
Confirm fallback responses pass the same parser.
Confirm dashboards separate primary, retry, and fallback traffic.
Confirm the runbook tells the operator when to disable fallback.

Record the result in your internal incident-readiness notes. If this site later publishes a validation template, link it from /sites/llm-api-reliability/editorial/ .

Suggested alert routing

Alert	Page immediately?	Owner	Notes
Sustained 5xx plus user-visible failures	Yes	On-call application/platform engineer	Fallback may be appropriate if pre-approved.
Sustained timeout rate	Yes	Platform engineer	Check network, DNS, upstream latency, and client timeout changes.
429 or quota-like failures	Usually	Platform or account owner	Fallback requires budget and capacity approval.
400 validation spike after deploy	Yes, if production-impacting	Releasing team	Roll back or patch request builder.
401 or 403	Yes	Secret/account owner	Do not auto-fallback until credentials and account status are verified.
Parser failure spike	Yes, if user-facing	Application owner	Revalidate response contract and structured-output assumptions.
Fallback volume spike	Yes	Reliability owner	Detect retry storms and cost amplification.

FAQ

Should every failed chat completion request fallback automatically?

No. Automatic fallback is safest for transient transport failures, selected 5xx responses, and latency events where the alternate route is already validated. It is usually unsafe for malformed requests, auth failures, unsupported parameters, or schema bugs.

Is a retry the same as fallback?

No. A retry repeats the request against the same route, usually after a short delay. Fallback changes behavior: a secondary route, degraded response, queue, or fail-closed path. Track them separately.

What is the most important metric?

There is no single metric. Combine status code rate, timeout rate, latency, parser failure rate, fallback volume, and user-visible failure rate. A low HTTP error rate can still hide malformed successful responses.

Should 429 always trigger fallback?

Not always. A 429-style signal may indicate rate limiting, quota, or account-level constraints. Fallback can be appropriate only if the secondary route has approved capacity, compatible behavior, and cost controls.

Can fallback hide product bugs?

Yes. If the client sends an invalid request or parses the response incorrectly, fallback may make the incident harder to diagnose. That is why this runbook separates deterministic 4xx and parser failures from retryable transport failures.

How often should this runbook be reviewed?

Review it whenever the CometAPI endpoint documentation changes, when you add a new model route, after any production incident, and at least once per quarter for high-traffic workloads.

Sources checked

Source	Access date	Purpose
CometAPI API documentation	2026-05-09	Checked as the public documentation entry point for API references and integration navigation.
CometAPI chat completions endpoint reference	2026-05-09	Checked as the evidence source for chat completion request and response contract verification.
CometAPI help center	2026-05-09	Checked for account, support, billing, and operational guidance that may affect fallback decisions.