A CometAPI fallback runbook for chat completions

Last reviewed: 2026-05-11

Who this is for: platform, SRE, and application operators who already route chat-completion traffic through an LLM gateway and need a controlled fallback procedure for CometAPI-backed requests.

For related reliability patterns, see the site index at /sites/llm-api-reliability/ and the post archive at /sites/llm-api-reliability/posts/ .

Key takeaways

Treat fallback as a contract-controlled production path, not as an ad hoc retry.
Verify the CometAPI chat-completion endpoint, authentication format, request schema, response schema, and error behavior against the API reference before enabling failover.
Use bounded retries only for retryable failures; do not replay requests blindly after ambiguous timeouts unless your application can tolerate duplicate side effects.
Log every fallback decision with the original failure class, selected fallback target, model identifier, request hash, latency, and final status.
Keep rate-limit, billing, and model-availability assumptions out of code unless verified in the current CometAPI documentation or your account contract.

Concise definition

A chat-completion fallback runbook is an operational procedure that decides when a failed or degraded primary chat-completion request should be retried, routed to a secondary model or provider path, returned as a controlled error, or paused for manual intervention.

In this article, “CometAPI fallback” means a fallback path that sends a chat-completion style request to CometAPI after your router has classified the primary path as unavailable, degraded, or unsuitable for the request. The public CometAPI API documentation is available at https://apidoc.cometapi.com/ , and the referenced chat-completion endpoint page is https://apidoc.cometapi.com/api-13851472 .

Operating assumptions

Use these as starting assumptions to verify, not as universal facts:

Your application has a request router or gateway in front of chat-completion calls.
Each request has a correlation ID that follows it through primary, retry, fallback, and client response paths.
Fallback is allowed only for requests that are safe to reroute under your product policy.
Your team can disable fallback with a feature flag or routing rule without redeploying the application.
Your team has reviewed the CometAPI API reference and help center before production use: https://apidoc.cometapi.com/ and https://apidoc.cometapi.com/help-center .

When to use fallback

Use the fallback path only after classifying the original failure. A practical operator policy is:

Primary outcome	Fallback action	Operator note
Connection failure before request body is accepted	Eligible for one fallback attempt	Preserve the same user-visible request ID.
HTTP 429 or documented rate-limit response	Eligible only if fallback capacity is confirmed	Do not turn one rate-limit event into provider-wide retry amplification.
HTTP 5xx from the primary path	Eligible for bounded fallback	Record upstream status and response body class, not sensitive content.
Request validation error	Not eligible	Fix the request; do not send malformed payloads to another route.
Auth failure	Not eligible	Rotate or repair credentials; fallback can hide a broken deployment.
Timeout with unknown upstream execution state	Conditional	Only fallback if duplicate generation is acceptable or the call is idempotency-protected.
Safety, policy, or content rejection	Usually not eligible	Follow product policy; do not use fallback to bypass controls.

Fallback decision record

For every fallback attempt, write a structured decision record. This is more useful during incidents than a generic “retry failed” log line.

Recommended fields:

trace_id
user_request_id
primary_route
fallback_route
primary_failure_class
primary_http_status
primary_latency_ms
fallback_started_at
fallback_http_status
fallback_latency_ms
request_body_hash
model_requested
model_sent_to_fallback
streaming_enabled
final_client_status
operator_policy_version

The request hash should be non-reversible and should not store prompt text. Store sensitive payloads only in systems approved for that data class.

Contract details to verify

Before enabling production fallback, verify each contract item against the current CometAPI documentation and your account-specific terms.

Contract area	What to verify	Runbook default before verification	Source to check
Endpoint paths	Exact chat-completion path, HTTP method, base URL, and whether the route is OpenAI-compatible or CometAPI-specific.	Do not hard-code endpoint paths until checked in the endpoint reference.	CometAPI API reference: https://apidoc.cometapi.com/api-13851472
Auth headers	Required authorization header name, token format, and whether any organization/project headers are required.	Use secret-manager injection; never commit keys. Block fallback if auth config is missing.	API docs and help center: https://apidoc.cometapi.com/ and https://apidoc.cometapi.com/help-center
Request fields	Required and optional fields for model, messages, streaming, temperature, max tokens, tools, and metadata.	Send only fields verified as supported; strip provider-specific fields from the primary request unless documented.	Endpoint reference: https://apidoc.cometapi.com/api-13851472
Response fields	Response object shape, message location, usage fields, finish reason, streaming chunk format, and error payload shape.	Parse defensively; treat missing expected fields as an integration error.	Endpoint reference: https://apidoc.cometapi.com/api-13851472
Error behavior	HTTP status codes, retryable vs non-retryable errors, validation errors, auth errors, and timeout semantics.	Retry only network failures and documented transient classes; do not retry validation or auth errors.	API docs and help center: https://apidoc.cometapi.com/help-center
Rate-limit assumptions	Whether rate limits are per key, model, route, account, or time window; response headers, if any.	Assume rate limits exist and are finite; set local concurrency caps until verified.	API docs, help center, and account contract: https://apidoc.cometapi.com/
Billing assumptions	Whether failed requests, streamed tokens, fallback duplicates, or partial generations can affect billing.	Do not publish cost guarantees; meter fallback separately in internal telemetry.	Account contract and help center: https://apidoc.cometapi.com/help-center

Sanitized fallback policy example

This example is intentionally generic. Replace paths, model names, and headers only after verifying the current CometAPI endpoint contract.

{
  "policy_name": "chat_completion_fallback_cometapi",
  "policy_version": "2026-05-11",
  "enabled": true,
  "primary_route": {
    "name": "primary_chat_completion",
    "timeout_ms": 12000,
    "max_attempts": 1
  },
  "fallback_route": {
    "name": "cometapi_chat_completion",
    "base_url": "https://YOUR_VERIFIED_COMETAPI_BASE_URL",
    "path": "YOUR_VERIFIED_CHAT_COMPLETIONS_PATH",
    "method": "POST",
    "auth_header": "YOUR_VERIFIED_AUTH_HEADER",
    "timeout_ms": 15000,
    "max_attempts": 1
  },
  "eligible_failure_classes": [
    "network_connect_failure",
    "network_read_timeout_before_response",
    "documented_transient_5xx",
    "documented_retryable_rate_limit"
  ],
  "ineligible_failure_classes": [
    "request_validation_error",
    "authentication_error",
    "authorization_error",
    "policy_rejection",
    "malformed_tool_schema"
  ],
  "safety_controls": {
    "require_request_hash": true,
    "log_prompt_text": false,
    "cap_total_attempts_across_routes": 2,
    "disable_on_error_ratio_over_example": 0.20,
    "disable_on_p95_latency_ms_over_example": 30000
  }
}

The numerical thresholds above are examples to tune. They are not claims about CometAPI behavior.

Pre-production validation steps

1. Verify the documented contract

Open the CometAPI API documentation at https://apidoc.cometapi.com/ and the chat-completion endpoint page at https://apidoc.cometapi.com/api-13851472 .

Confirm:

Base URL and endpoint path.
Required authentication header.
Required request body fields.
Supported optional request body fields.
Streaming vs non-streaming behavior.
Error response format.
Whether usage information is returned and where it appears.
Whether the endpoint has documented retry or rate-limit guidance.

Record the exact documentation access date in your integration notes.

2. Build a request normalizer

Do not forward your primary provider payload blindly. Add a normalizer that:

maps your internal message format to the verified CometAPI request schema;
removes unsupported fields;
validates required fields before network dispatch;
enforces maximum request size according to your own product limits and documented provider constraints;
attaches a correlation ID in a supported metadata field only if documented.

If metadata fields are not documented, keep correlation IDs in your gateway logs instead of adding unknown request fields.

3. Classify errors before routing

Your router should classify failures before fallback. A minimal classification set:

network_failure
timeout_before_response
timeout_after_partial_response
http_429
http_5xx
http_4xx_validation
http_401_403_auth
policy_rejection
parser_error
unknown

Only the classes explicitly allowed by policy should trigger CometAPI fallback.

4. Validate non-streaming first

Before validating streamed responses, test non-streaming chat completions with a harmless prompt and a short expected output. Confirm:

HTTP status is successful.
Response parser finds the assistant message in the documented location.
Finish reason is handled.
Usage fields are parsed only if present and documented.
Latency is recorded.
Request and response are redacted according to your logging policy.

5. Validate streaming separately

If your application streams tokens to clients, treat streaming as a separate integration. Verify:

chunk format;
end-of-stream marker;
partial-output behavior on disconnect;
client cancellation propagation;
timeout behavior after partial output;
whether fallback is disabled once partial output has reached the user.

A conservative production rule is: once the user has received partial streamed output, do not start a second fallback stream for the same user-visible request unless the product explicitly supports duplicated or stitched output.

6. Run failure-injection tests

Inject failures at your gateway, not in production user traffic first.

Recommended tests:

Test	Expected behavior
Primary connect failure	One CometAPI fallback attempt if policy allows.
Primary 500	One fallback attempt; original status recorded.
Primary 400 validation error	No fallback; return controlled client error.
Primary 401 auth error	No fallback; page operator or rotate secret.
Fallback validation error	Stop; mark integration contract failure.
Fallback timeout	Return controlled degraded response; do not attempt unbounded cascades.
Fallback parser error	Stop; store response shape sample in secure diagnostics.
Fallback disabled flag	No fallback even for eligible primary failures.

7. Add an operator kill switch

Fallback must be disableable quickly. At minimum, support:

global disable for all CometAPI fallback traffic;
route-level disable for one application or tenant;
streaming-only disable;
high-risk feature disable, such as tools or long-context calls;
automatic disable on repeated parser errors.

Do not require code redeploys for these controls.

Production rollout plan

A safe rollout sequence:

Documentation verification complete.
Contract tests pass in a non-production environment.
Secrets are loaded from the approved secret manager.
Observability dashboards are live.
Fallback disabled by default in production.
Enable for internal traffic.
Enable for a small, reversible traffic slice.
Review fallback decision records for malformed routing.
Expand only if error classification and response parsing are stable.
Keep a rollback owner assigned during the rollout window.

Metrics to watch

Track these separately for primary and fallback routes:

request count;
success ratio;
HTTP status distribution;
network failure count;
validation error count;
auth error count;
p50, p95, and p99 latency;
timeout count;
parser error count;
stream interruption count;
token or usage fields when documented and available;
fallback trigger reason;
fallback suppression reason.

Also watch for hidden amplification: one user request should not become a chain of many upstream attempts. A practical cap is one primary attempt plus one fallback attempt unless you have a documented reason to do more.

Incident response procedure

When the primary route degrades:

Confirm the failure class from gateway telemetry.
Check whether fallback is enabled for the affected application.
Confirm CometAPI auth configuration is healthy.
Confirm the fallback route is not producing validation or parser errors.
Enable fallback only for eligible failure classes.
Watch error ratio, latency, and request volume for the fallback route.
If fallback errors rise, disable fallback and return a controlled degraded response.
Record the event, policy version, and operator decisions in the incident timeline.

When CometAPI fallback degrades:

Disable CometAPI fallback using the kill switch.
Stop retry amplification.
Preserve samples of sanitized error metadata.
Compare observed errors with documented error behavior from https://apidoc.cometapi.com/help-center .
Re-enable only after contract, auth, rate-limit, or upstream health issues are understood.

What not to do

Avoid these failure patterns:

Do not fallback on malformed requests.
Do not fallback on authentication failures.
Do not fallback to bypass content or safety policy.
Do not assume response fields that are not documented.
Do not log raw prompts or completions in generic infrastructure logs.
Do not retry indefinitely across multiple providers.
Do not treat fallback success as evidence that the primary incident is resolved.
Do not make pricing, availability, or performance assumptions unless supported by current documentation or your contract.

Sources checked

Source	Access date	Purpose
https://apidoc.cometapi.com/	2026-05-11	Locate the public CometAPI API documentation entry point and confirm where operators should verify current API contracts.
https://apidoc.cometapi.com/api-13851472	2026-05-11	Reference the chat-completion endpoint page for endpoint, request, response, and error contract verification.
https://apidoc.cometapi.com/help-center	2026-05-11	Check help and operational guidance areas for auth, errors, limits, account, or support details that may affect fallback operation.

FAQ

Should every failed primary request fall back to CometAPI?

No. Only requests with eligible failure classes should fall back. Validation errors, authentication failures, authorization failures, policy rejections, and malformed tool schemas should stop immediately.

Should fallback use the exact same request body as the primary provider?

Usually no. Normalize the request into the verified CometAPI schema. Strip unsupported fields and validate required fields before dispatch.

Can fallback be used for streaming responses?

Yes, if streaming is supported and verified for your chosen endpoint and application behavior. Validate streaming separately from non-streaming, and avoid starting a second stream after partial output has already reached the user unless your product explicitly supports that behavior.

How many fallback attempts should be allowed?

Use a small bounded number. A practical starting point is one primary attempt and one fallback attempt, then tune from production evidence. Avoid retry cascades.

What should happen if CometAPI returns a parser error in our gateway?

Treat it as an integration contract failure. Stop fallback for that route, capture a sanitized sample, compare it with the documented response schema, and fix the parser or request mapping before re-enabling.

Where should rate-limit and billing assumptions live?

Keep them in configuration and operational documentation, not hard-coded business logic. Verify them against current CometAPI documentation, the help center, and your account contract before using them for production decisions.

Is this runbook a guarantee of reliability?

No. It is an operational pattern for reducing uncontrolled failure behavior. Actual results depend on your application design, traffic, contracts, provider behavior, and incident response discipline.