A CometAPI fallback runbook for chat completions
Last reviewed: 2026-05-11
Who this is for: platform, SRE, and application operators who already route chat-completion traffic through an LLM gateway and need a controlled fallback procedure for CometAPI-backed requests.
For related reliability patterns, see the site index at /sites/llm-api-reliability/ and the post archive at /sites/llm-api-reliability/posts/.
Key takeaways
- Treat fallback as a contract-controlled production path, not as an ad hoc retry.
- Verify the CometAPI chat-completion endpoint, authentication format, request schema, response schema, and error behavior against the API reference before enabling failover.
- Use bounded retries only for retryable failures; do not replay requests blindly after ambiguous timeouts unless your application can tolerate duplicate side effects.
- Log every fallback decision with the original failure class, selected fallback target, model identifier, request hash, latency, and final status.
- Keep rate-limit, billing, and model-availability assumptions out of code unless verified in the current CometAPI documentation or your account contract.
Concise definition
A chat-completion fallback runbook is an operational procedure that decides when a failed or degraded primary chat-completion request should be retried, routed to a secondary model or provider path, returned as a controlled error, or paused for manual intervention.
In this article, “CometAPI fallback” means a fallback path that sends a chat-completion style request to CometAPI after your router has classified the primary path as unavailable, degraded, or unsuitable for the request. The public CometAPI API documentation is available at https://apidoc.cometapi.com/, and the referenced chat-completion endpoint page is https://apidoc.cometapi.com/api-13851472.
Operating assumptions
Use these as starting assumptions to verify, not as universal facts:
- Your application has a request router or gateway in front of chat-completion calls.
- Each request has a correlation ID that follows it through primary, retry, fallback, and client response paths.
- Fallback is allowed only for requests that are safe to reroute under your product policy.
- Your team can disable fallback with a feature flag or routing rule without redeploying the application.
- Your team has reviewed the CometAPI API reference and help center before production use: https://apidoc.cometapi.com/ and https://apidoc.cometapi.com/help-center.
When to use fallback
Use the fallback path only after classifying the original failure. A practical operator policy is:
| Primary outcome | Fallback action | Operator note |
|---|---|---|
| Connection failure before request body is accepted | Eligible for one fallback attempt | Preserve the same user-visible request ID. |
| HTTP 429 or documented rate-limit response | Eligible only if fallback capacity is confirmed | Do not turn one rate-limit event into provider-wide retry amplification. |
| HTTP 5xx from the primary path | Eligible for bounded fallback | Record upstream status and response body class, not sensitive content. |
| Request validation error | Not eligible | Fix the request; do not send malformed payloads to another route. |
| Auth failure | Not eligible | Rotate or repair credentials; fallback can hide a broken deployment. |
| Timeout with unknown upstream execution state | Conditional | Only fallback if duplicate generation is acceptable or the call is idempotency-protected. |
| Safety, policy, or content rejection | Usually not eligible | Follow product policy; do not use fallback to bypass controls. |
Fallback decision record
For every fallback attempt, write a structured decision record. This is more useful during incidents than a generic “retry failed” log line.
Recommended fields:
trace_iduser_request_idprimary_routefallback_routeprimary_failure_classprimary_http_statusprimary_latency_msfallback_started_atfallback_http_statusfallback_latency_msrequest_body_hashmodel_requestedmodel_sent_to_fallbackstreaming_enabledfinal_client_statusoperator_policy_version
The request hash should be non-reversible and should not store prompt text. Store sensitive payloads only in systems approved for that data class.
Contract details to verify
Before enabling production fallback, verify each contract item against the current CometAPI documentation and your account-specific terms.
| Contract area | What to verify | Runbook default before verification | Source to check |
|---|---|---|---|
| Endpoint paths | Exact chat-completion path, HTTP method, base URL, and whether the route is OpenAI-compatible or CometAPI-specific. | Do not hard-code endpoint paths until checked in the endpoint reference. | CometAPI API reference: https://apidoc.cometapi.com/api-13851472 |
| Auth headers | Required authorization header name, token format, and whether any organization/project headers are required. | Use secret-manager injection; never commit keys. Block fallback if auth config is missing. | API docs and help center: https://apidoc.cometapi.com/ and https://apidoc.cometapi.com/help-center |
| Request fields | Required and optional fields for model, messages, streaming, temperature, max tokens, tools, and metadata. | Send only fields verified as supported; strip provider-specific fields from the primary request unless documented. | Endpoint reference: https://apidoc.cometapi.com/api-13851472 |
| Response fields | Response object shape, message location, usage fields, finish reason, streaming chunk format, and error payload shape. | Parse defensively; treat missing expected fields as an integration error. | Endpoint reference: https://apidoc.cometapi.com/api-13851472 |
| Error behavior | HTTP status codes, retryable vs non-retryable errors, validation errors, auth errors, and timeout semantics. | Retry only network failures and documented transient classes; do not retry validation or auth errors. | API docs and help center: https://apidoc.cometapi.com/help-center |
| Rate-limit assumptions | Whether rate limits are per key, model, route, account, or time window; response headers, if any. | Assume rate limits exist and are finite; set local concurrency caps until verified. | API docs, help center, and account contract: https://apidoc.cometapi.com/ |
| Billing assumptions | Whether failed requests, streamed tokens, fallback duplicates, or partial generations can affect billing. | Do not publish cost guarantees; meter fallback separately in internal telemetry. | Account contract and help center: https://apidoc.cometapi.com/help-center |
Sanitized fallback policy example
This example is intentionally generic. Replace paths, model names, and headers only after verifying the current CometAPI endpoint contract.
{
"policy_name": "chat_completion_fallback_cometapi",
"policy_version": "2026-05-11",
"enabled": true,
"primary_route": {
"name": "primary_chat_completion",
"timeout_ms": 12000,
"max_attempts": 1
},
"fallback_route": {
"name": "cometapi_chat_completion",
"base_url": "https://YOUR_VERIFIED_COMETAPI_BASE_URL",
"path": "YOUR_VERIFIED_CHAT_COMPLETIONS_PATH",
"method": "POST",
"auth_header": "YOUR_VERIFIED_AUTH_HEADER",
"timeout_ms": 15000,
"max_attempts": 1
},
"eligible_failure_classes": [
"network_connect_failure",
"network_read_timeout_before_response",
"documented_transient_5xx",
"documented_retryable_rate_limit"
],
"ineligible_failure_classes": [
"request_validation_error",
"authentication_error",
"authorization_error",
"policy_rejection",
"malformed_tool_schema"
],
"safety_controls": {
"require_request_hash": true,
"log_prompt_text": false,
"cap_total_attempts_across_routes": 2,
"disable_on_error_ratio_over_example": 0.20,
"disable_on_p95_latency_ms_over_example": 30000
}
}
The numerical thresholds above are examples to tune. They are not claims about CometAPI behavior.
Pre-production validation steps
1. Verify the documented contract
Open the CometAPI API documentation at https://apidoc.cometapi.com/ and the chat-completion endpoint page at https://apidoc.cometapi.com/api-13851472.
Confirm:
- Base URL and endpoint path.
- Required authentication header.
- Required request body fields.
- Supported optional request body fields.
- Streaming vs non-streaming behavior.
- Error response format.
- Whether usage information is returned and where it appears.
- Whether the endpoint has documented retry or rate-limit guidance.
Record the exact documentation access date in your integration notes.
2. Build a request normalizer
Do not forward your primary provider payload blindly. Add a normalizer that:
- maps your internal message format to the verified CometAPI request schema;
- removes unsupported fields;
- validates required fields before network dispatch;
- enforces maximum request size according to your own product limits and documented provider constraints;
- attaches a correlation ID in a supported metadata field only if documented.
If metadata fields are not documented, keep correlation IDs in your gateway logs instead of adding unknown request fields.
3. Classify errors before routing
Your router should classify failures before fallback. A minimal classification set:
network_failuretimeout_before_responsetimeout_after_partial_responsehttp_429http_5xxhttp_4xx_validationhttp_401_403_authpolicy_rejectionparser_errorunknown
Only the classes explicitly allowed by policy should trigger CometAPI fallback.
4. Validate non-streaming first
Before validating streamed responses, test non-streaming chat completions with a harmless prompt and a short expected output. Confirm:
- HTTP status is successful.
- Response parser finds the assistant message in the documented location.
- Finish reason is handled.
- Usage fields are parsed only if present and documented.
- Latency is recorded.
- Request and response are redacted according to your logging policy.
5. Validate streaming separately
If your application streams tokens to clients, treat streaming as a separate integration. Verify:
- chunk format;
- end-of-stream marker;
- partial-output behavior on disconnect;
- client cancellation propagation;
- timeout behavior after partial output;
- whether fallback is disabled once partial output has reached the user.
A conservative production rule is: once the user has received partial streamed output, do not start a second fallback stream for the same user-visible request unless the product explicitly supports duplicated or stitched output.
6. Run failure-injection tests
Inject failures at your gateway, not in production user traffic first.
Recommended tests:
| Test | Expected behavior |
|---|---|
| Primary connect failure | One CometAPI fallback attempt if policy allows. |
| Primary 500 | One fallback attempt; original status recorded. |
| Primary 400 validation error | No fallback; return controlled client error. |
| Primary 401 auth error | No fallback; page operator or rotate secret. |
| Fallback validation error | Stop; mark integration contract failure. |
| Fallback timeout | Return controlled degraded response; do not attempt unbounded cascades. |
| Fallback parser error | Stop; store response shape sample in secure diagnostics. |
| Fallback disabled flag | No fallback even for eligible primary failures. |
7. Add an operator kill switch
Fallback must be disableable quickly. At minimum, support:
- global disable for all CometAPI fallback traffic;
- route-level disable for one application or tenant;
- streaming-only disable;
- high-risk feature disable, such as tools or long-context calls;
- automatic disable on repeated parser errors.
Do not require code redeploys for these controls.
Production rollout plan
A safe rollout sequence:
- Documentation verification complete.
- Contract tests pass in a non-production environment.
- Secrets are loaded from the approved secret manager.
- Observability dashboards are live.
- Fallback disabled by default in production.
- Enable for internal traffic.
- Enable for a small, reversible traffic slice.
- Review fallback decision records for malformed routing.
- Expand only if error classification and response parsing are stable.
- Keep a rollback owner assigned during the rollout window.
Metrics to watch
Track these separately for primary and fallback routes:
- request count;
- success ratio;
- HTTP status distribution;
- network failure count;
- validation error count;
- auth error count;
- p50, p95, and p99 latency;
- timeout count;
- parser error count;
- stream interruption count;
- token or usage fields when documented and available;
- fallback trigger reason;
- fallback suppression reason.
Also watch for hidden amplification: one user request should not become a chain of many upstream attempts. A practical cap is one primary attempt plus one fallback attempt unless you have a documented reason to do more.
Incident response procedure
When the primary route degrades:
- Confirm the failure class from gateway telemetry.
- Check whether fallback is enabled for the affected application.
- Confirm CometAPI auth configuration is healthy.
- Confirm the fallback route is not producing validation or parser errors.
- Enable fallback only for eligible failure classes.
- Watch error ratio, latency, and request volume for the fallback route.
- If fallback errors rise, disable fallback and return a controlled degraded response.
- Record the event, policy version, and operator decisions in the incident timeline.
When CometAPI fallback degrades:
- Disable CometAPI fallback using the kill switch.
- Stop retry amplification.
- Preserve samples of sanitized error metadata.
- Compare observed errors with documented error behavior from https://apidoc.cometapi.com/help-center.
- Re-enable only after contract, auth, rate-limit, or upstream health issues are understood.
What not to do
Avoid these failure patterns:
- Do not fallback on malformed requests.
- Do not fallback on authentication failures.
- Do not fallback to bypass content or safety policy.
- Do not assume response fields that are not documented.
- Do not log raw prompts or completions in generic infrastructure logs.
- Do not retry indefinitely across multiple providers.
- Do not treat fallback success as evidence that the primary incident is resolved.
- Do not make pricing, availability, or performance assumptions unless supported by current documentation or your contract.
Sources checked
| Source | Access date | Purpose |
|---|---|---|
| https://apidoc.cometapi.com/ | 2026-05-11 | Locate the public CometAPI API documentation entry point and confirm where operators should verify current API contracts. |
| https://apidoc.cometapi.com/api-13851472 | 2026-05-11 | Reference the chat-completion endpoint page for endpoint, request, response, and error contract verification. |
| https://apidoc.cometapi.com/help-center | 2026-05-11 | Check help and operational guidance areas for auth, errors, limits, account, or support details that may affect fallback operation. |
FAQ
Should every failed primary request fall back to CometAPI?
No. Only requests with eligible failure classes should fall back. Validation errors, authentication failures, authorization failures, policy rejections, and malformed tool schemas should stop immediately.
Should fallback use the exact same request body as the primary provider?
Usually no. Normalize the request into the verified CometAPI schema. Strip unsupported fields and validate required fields before dispatch.
Can fallback be used for streaming responses?
Yes, if streaming is supported and verified for your chosen endpoint and application behavior. Validate streaming separately from non-streaming, and avoid starting a second stream after partial output has already reached the user unless your product explicitly supports that behavior.
How many fallback attempts should be allowed?
Use a small bounded number. A practical starting point is one primary attempt and one fallback attempt, then tune from production evidence. Avoid retry cascades.
What should happen if CometAPI returns a parser error in our gateway?
Treat it as an integration contract failure. Stop fallback for that route, capture a sanitized sample, compare it with the documented response schema, and fix the parser or request mapping before re-enabling.
Where should rate-limit and billing assumptions live?
Keep them in configuration and operational documentation, not hard-coded business logic. Verify them against current CometAPI documentation, the help center, and your account contract before using them for production decisions.
Is this runbook a guarantee of reliability?
No. It is an operational pattern for reducing uncontrolled failure behavior. Actual results depend on your application design, traffic, contracts, provider behavior, and incident response discipline.