Last reviewed: 2026-06-08

Direct answer

Before you commit to a failover path for any LLM API, you need to know what a response contract looks like on each endpoint you intend to switch between. A response contract is the set of fields, shapes, and status codes your application code reads from an API response. If the contract differs between your primary and fallback endpoints, a silent failover can swallow errors, misread completion states, or drop streaming events your caller depends on.

For CometAPI’s two main text generation endpoints — Chat Completions (POST /v1/chat/completions) and Responses (POST /v1/responses) — the contracts diverge in structured-output syntax, streaming event sequences, tool-call shapes, and which parameters each provider family actually honors. Identifying and verifying those differences before wiring up failover logic is the core of response-contract evidence gathering.

Key operational actions:

  1. Fetch the live API reference for each endpoint you plan to route between and note every response field your application reads.
  2. Build a small smoke-test suite that asserts contract-stable fields (e.g., choices[0].finish_reason, choices[0].message.content) and logs any field that differs.
  3. For cross-provider routing, consult the parameter-support table in the Chat Completions reference to confirm which parameters you send are honored by the target provider family.
  4. When adding a Responses-API path alongside Chat Completions, note that the streaming event sequence is different and your SSE parser must handle both shapes.
  5. Verify the error shape (error.message, error.type, error.code) is the same on your primary and failback endpoints so your retry and circuit-breaker logic fires on the correct signal.

Start with CometAPI if you want a single base URL that routes Chat Completions and Responses requests across multiple provider families while preserving a consistent error shape.


For broader release checks, see CometAPI Chat Reliability Contract Review.

Who this is for

This article is for backend engineers, platform engineers, and reliability-focused teams who:

  • Are adding a secondary LLM provider as a failover for an existing Chat Completions or Responses API integration.
  • Need to audit which response fields their application code depends on before changing the routing layer.
  • Are building or reviewing retry, circuit-breaker, or fallback decision logic and want to know which contract signals are safe to read across endpoints.
  • Have encountered silent failures where a 200 response was returned but the response shape was unexpected because of a provider-family difference.

You do not need to be an SRE specialist, but you should be comfortable reading API reference documentation and writing a basic HTTP request test.


Key takeaways

  • The Chat Completions and Responses endpoints share some field names but differ in streaming event sequences, structured-output syntax, and tool-call shapes. Check the live reference for both before routing between them.
  • The finish_reason field in Chat Completions (stop, tool_calls, length, content_filter) is one of the most reliable contract signals for failover logic. Confirm it is present and correct in your smoke test.
  • Several request parameters — including logprobs, n > 1, and reasoning_effort — are not universally supported across provider families available through CometAPI. Sending an unsupported parameter does not always produce an explicit error; sometimes it is silently ignored. Verify in the cross-provider notes in the Chat Completions reference.
  • The CometAPI platform applies automatic multi-channel retry and automatic channel restart on failure. That means some transient errors you would otherwise see are absorbed upstream. Your application’s retry logic should still handle 429 Too Many Requests with exponential backoff.
  • The Help Center notes a scheduled maintenance window (approximately 1–5 AM) during which temporary instability is possible. Batch operators should implement a reconnection mechanism and save request state.
  • The error shape for CometAPI errors has three fields that are always present: error.message (string), error.type (string, e.g. comet_api_error), and error.code (string or null). A param field (string or null) may appear for request-shape errors. Verify this shape is what your retry logic reads.
  • For o-series reasoning models and GPT-5 series models, the Responses endpoint is recommended over Chat Completions. This means a failover path that crosses between model families may also need to cross between endpoint shapes.

Smoke-test workflow

Setup assumptions

  • You have a CometAPI API key set as COMETAPI_KEY in your environment.
  • You have curl or an HTTP client available.
  • You are testing against https://api.cometapi.com/v1.
  • The model identifier you use must be confirmed against the current model catalog at https://apidoc.cometapi.com/overview/models before running the test.
  • This smoke test does not assert uptime, latency targets, or billing costs. It only verifies that the response contract fields your failover logic reads are present and in the expected shape.

Happy-path request plan (Chat Completions)

POST https://api.cometapi.com/v1/chat/completions Authorization: Bearer <COMETAPI_KEY> Content-Type: application/json

{ “model”: “”, “messages”: [ {“role”: “user”, “content”: “Reply with exactly the word PASS.”} ] }

Minimum assertions:

  • HTTP status is 200.
  • Response body contains choices array with at least one element.
  • choices[0].finish_reason is present and is one of the documented values (stop, length, tool_calls, content_filter).
  • choices[0].message.content is a non-null string.
  • choices[0].message.role equals assistant.
  • Top-level object field equals chat.completion.

Error-path check

Send the same request with a deliberately invalid or missing model field and verify:

  • HTTP status is in the 4xx range.
  • Response body contains an error object.
  • error.message is a non-empty string.
  • error.type is a non-empty string (e.g. comet_api_error or invalid_request_error).
  • error.code is present (may be null).

This confirms that your error-shape parser will fire on the correct signal during a real failover event rather than interpreting an error response as an empty success.

What the smoke test must not assert

  • Specific latency targets or SLA thresholds.
  • Specific model availability or pricing.
  • Exact error.message string content (messages can change without a breaking API change).
  • Token counts, cost, or billing fields.

Pass/fail logging fields

Record the following fields after each smoke-test run (use placeholder values below as a template):

smoke_test_run_id: endpoint: POST /v1/chat/completions http_status: finish_reason_present: <true|false> content_non_null: <true|false> error_shape_valid_on_4xx: <true|false> test_timestamp_utc: pass: <true|false> notes:

Do not log API key values, full prompt text, or full generated responses to shared test records.


Contract areas and cross-provider verification

The cross-provider parameter-support table in the Chat Completions reference shows that several parameters behave differently depending on which provider family CometAPI routes a request to. Before treating any parameter as a stable failover signal, verify it in that table.

Specific areas to note:

  • temperature: Range is 0–2 for OpenAI GPT and Gemini, but 0–1 for Claude via the compatibility layer. Sending temperature: 1.5 to a Claude-backed model may produce unexpected behavior.
  • n (number of completions): OpenAI GPT supports 1–128; Claude via compatibility supports 1 only; Gemini supports 1–8. An application that reads choices[1] or beyond should not assume failover to Claude will produce the same number of choices.
  • logprobs: Supported for OpenAI GPT only. Not supported for Claude or Gemini via compatibility. Do not use this field as a failover signal.
  • reasoning_effort: Only applies to o-series and GPT-5.1+ models. Sending it to other model families has no effect.
  • max_tokens vs max_completion_tokens: max_completion_tokens is the recommended parameter for GPT-4.1, GPT-5 series, and o-series models. The Chat Completions docs note that CometAPI automatically handles mapping when routing to different providers, but verify this behavior for your target model.

For the Responses endpoint, previous_response_id enables stateful chaining. If your failover path switches from Chat Completions (which uses a messages array for context) to the Responses endpoint, you lose the ability to resume a stateful response chain mid-session. Design your failover logic to restart the context window if this switch occurs.

See also the related runbook for handling timeout budgets: Timeout-budget fallback checks for chat completions and Fallback Decision Logs for CometAPI Gateway Calls.


Failure modes

  • Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
  • Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
  • Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
  • Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
  • Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.

Sources checked

Contract details to verify

AreaWhat to verifySource URLAccessedSafe candidate wording
Chat Completions endpoint pathConfirm POST /v1/chat/completions is current and the base URL is https://api.cometapi.com/v1https://apidoc.cometapi.com/api/text/chat2026-06-08“The Chat Completions endpoint accepts POST requests at /v1/chat/completions.”
finish_reason valuesConfirm the complete set of documented finish_reason values and any provider-family differenceshttps://apidoc.cometapi.com/api/text/chat2026-06-08“finish_reason may be stop, length, tool_calls, or content_filter; verify current docs for provider-specific additions.”
Error shape fieldsConfirm error.message, error.type, error.code, and error.param are the current error object fieldshttps://apidoc.cometapi.com/api/text/chat2026-06-08“The error object includes message, type, and code fields; param may be present for request-shape errors.”
Cross-provider parameter supportVerify the parameter-support table (temperature range, n, logprobs, reasoning_effort) is currenthttps://apidoc.cometapi.com/api/text/chat2026-06-08“Parameter support varies by provider family; consult the cross-provider notes in the Chat Completions reference.”
Responses endpoint shape vs Chat CompletionsConfirm which fields differ between Responses API output and Chat Completions outputhttps://apidoc.cometapi.com/api/text/responses2026-06-08“The Responses API returns an output array and a status field; Chat Completions returns a choices array.”
Streaming event sequence (Responses)Confirm the SSE event order for the Responses endpoint (response.created, response.in_progress, etc.)https://apidoc.cometapi.com/api/text/responses2026-06-08“The Responses streaming sequence begins with response.created and ends before [DONE]; verify current docs for full event list.”
Maintenance windowConfirm current scheduled maintenance window hours and recommended reconnection behaviorhttps://apidoc.cometapi.com/support/help-center2026-06-08“A scheduled maintenance window exists; verify current hours and implement a reconnection mechanism for batch workloads.”
403 WAF behaviorConfirm whether 403 responses from the WAF are expected to be retried automatically by the platform or require manual escalationhttps://apidoc.cometapi.com/support/help-center2026-06-08“403 errors may be triggered by WAF defense; verify current guidance before including 403 in automatic retry logic.”
Automatic multi-channel retryConfirm the current behavior of platform-level automatic retry and channel restart timinghttps://apidoc.cometapi.com/support/help-center2026-06-08“The platform applies automatic multi-channel retry; verify whether this affects how your application-level retry counts should be set.”

Reader next step

Compare the workflow against Start with CometAPI.

Use CometAPI Chat Reliability Contract Review as the next comparison point. Keep Timeout-budget fallback checks for chat completions nearby for setup and permission checks.

FAQ

Q: Is the Chat Completions error shape the same as the Responses API error shape?

Based on the current documentation, both endpoints use an error object with message, type, and code fields. However, the Responses API also includes a response-level status field (completed, failed, cancelled, etc.) and an incomplete_details object. Your error-handling code should be aware that a 200 response from the Responses endpoint does not mean the generation succeeded — check the status field as well. Verify the current error shape in the linked references before building production retry logic.

Q: If CometAPI already retries automatically at the platform level, do I still need application-level retry?

Yes. The Help Center notes that the platform switches channels automatically and restarts failed channels within approximately 5 minutes. However, 429 Too Many Requests from rate limiting is a client-side signal that the platform does not absorb on your behalf. The Chat Completions reference includes a Python example showing exponential backoff with jitter for RateLimitError. Keep application-level retry for 429 responses, but be aware that aggressive retry loops during overload can amplify the problem — the Google SRE overload reference linked in Sources checked covers this risk.

Q: What happens if I send a parameter that a target provider family does not support?

The documentation warns that “request parameters and response fields can vary significantly between model providers” and that some unsupported parameters may be silently ignored rather than returning an error. For example, logprobs is not supported for Claude or Gemini via the compatibility layer. If your failover routing can switch between provider families, audit which parameters you send and whether silent omission would cause your application to malfunction. Use the cross-provider parameter table in the Chat Completions reference as your starting point.

Q: The Responses API supports previous_response_id for stateful chaining. What happens to that state during failover?

State chained via previous_response_id is specific to a response ID on the Responses endpoint. If your failover path switches to Chat Completions, there is no equivalent state-forwarding mechanism — you would need to reconstruct the message history in the messages array. Design your failover logic to handle this context-reset case explicitly rather than assuming the session continues transparently.

Q: How do I know if my smoke test is catching real contract differences rather than just testing one provider?

Run the smoke test against the specific model identifiers you will use in production on both the primary and fallback paths. Compare the finish_reason values, object type string, and the presence of fields your application reads. If you are routing between provider families, also test an error case to confirm the error shape matches what your code expects from both paths.

Q: Where should I escalate if I see unexplained 403 errors in production?

The Help Center identifies 403 errors as potentially triggered by the WAF defense mechanism and recommends contacting customer service. Do not add 403 to your automatic retry loop without first confirming with CometAPI support whether the specific 403 you are seeing is retriable. Verify current escalation guidance at https://apidoc.cometapi.com/support/help-center.