Last reviewed: 2026-06-21
Direct answer
A CometAPI fallback runbook should not promote alternate routing just because one test request succeeds. Treat the smoke test as an evidence collection step: verify the documented endpoint family, confirm the request can be made with a placeholder model value, record the response shape, capture the error-path result, and note whether retries could add pressure during overload.
CometAPI documents an OpenAI-compatible base URL and a Chat Completions endpoint for multi-message conversations. Its Chat Completions reference also points some model families toward the Responses endpoint, so the runbook should make the endpoint decision explicit instead of assuming every fallback request uses the same shape. For retry behavior, use reliability guidance cautiously: retries can help with transient failures, but broad automatic retrying can amplify load when the upstream or gateway is already overloaded.
Use this checklist alongside Retry and Backoff Evidence for CometAPI Gateway Calls and Check CometAPI Response Shape Before Promoting Fallback Traffic before changing production fallback behavior.
For teams evaluating CometAPI as part of a fallback path, start from the documented contract and then review account-specific settings in your own dashboard. Start with CometAPI.
Who this is for
This guide is for on-call engineers, platform owners, and reliability reviewers who need a compact evidence checklist before routing LLM traffic through CometAPI or changing an existing fallback path. It is useful when a team is adding a new provider route, checking whether a fallback route still matches current documentation, or preparing an incident review that needs more than a pass/fail smoke-test result.
It assumes you already have an approved test environment, a CometAPI key stored outside source control, and permission to send a minimal non-sensitive request. It does not assume that a single request proves production readiness. The checklist is designed to keep the evidence small, repeatable, and reviewable without storing sensitive prompts, full generated responses, credentials, prices, rate limits, or customer data.
Key takeaways
- Verify the documented Chat Completions or Responses contract before choosing the request shape.
- Use
<API_KEY_PLACEHOLDER>in examples and logs; never paste real credentials into incident notes. - Record the endpoint family, request shape, response identifier fields, status class, and retry decision.
- Keep retry checks conservative because retry traffic can worsen overload when many clients retry at once.
- Do not use a smoke test to assert uptime, latency targets, pricing, billing, quotas, or model availability.
- Keep support escalation evidence separate from production routing decisions so an incident note does not become a hidden deployment approval.
Smoke-test workflow
Setup assumptions:
- The test runs from a non-production client or a controlled canary path.
- The operator has an environment variable or secret manager entry for the CometAPI key.
- The model value is supplied by the operator from the current CometAPI model list, not hard-coded in the runbook.
- The request uses sanitized input that contains no customer data.
- The fallback route has a clear rollback owner and an agreed stop condition before the test begins.
Happy-path request plan:
- Choose the documented endpoint family: Chat Completions for chat-style requests, or Responses when the current documentation recommends it for the selected model family.
- Send one minimal request with
Authorization: Bearer <API_KEY_PLACEHOLDER>,model: "<MODEL_ID>", and a short non-sensitive user message. - Save only the status code, request timestamp, endpoint family, model placeholder, response object type, response id presence, finish state when present, and token-usage field presence when returned.
- Compare the observed response structure with the fields your application actually reads before allowing fallback traffic to depend on that route.
Error-path check:
- Send one intentionally invalid request in the same controlled environment, such as a request with a missing required body field, and record whether the response is clearly classed as a client error. Do not retry this check.
- If the failure is ambiguous, stop at classification. Do not change models, endpoint families, credentials, or retry settings just to turn the check green.
Minimum assertions:
- The documented page for the chosen endpoint was reachable on the review date.
- The request was sent with bearer-token authentication from a safe secret source.
- The response returned a parseable JSON body or a clearly classified error body.
- The fallback decision log records whether the request was a happy-path check or an error-path check.
- The log has enough context for another operator to repeat the check without needing the original prompt or full response text.
What not to assert:
- Do not assert provider uptime, latency objectives, throughput limits, commercial terms, billing results, model availability, or full provider behavior from a single smoke test.
- Do not infer that all models share the same parameter support or response behavior.
- Do not treat an invalid-request error as proof that the production request shape is correct.
Sanitized log-record template:
review_date: 2026-06-21
operator: <INITIALS_OR_ROLE>
environment: <CANARY_OR_STAGING>
endpoint_family: <CHAT_COMPLETIONS_OR_RESPONSES>
model_reference: <MODEL_ID>
request_type: <HAPPY_PATH_OR_ERROR_PATH>
status_class: <2XX_OR_4XX_OR_5XX_OR_NETWORK_ERROR>
response_id_present: <TRUE_OR_FALSE_OR_NOT_RETURNED>
response_object_type: <VALUE_OR_NOT_RETURNED>
usage_field_present: <TRUE_OR_FALSE_OR_NOT_RETURNED>
retry_decision: <NO_RETRY_OR_SINGLE_RETRY_OR_ESCALATE>
fallback_decision: <KEEP_PRIMARY_OR_ROUTE_CANARY_OR_ROLL_BACK_OR_ESCALATE>
evidence_links: <DOC_URLS_AND_INTERNAL_INCIDENT_LINKS>
notes: <SHORT_SANITIZED_SUMMARY>
Sources checked
- CometAPI documentation - accessed 2026-06-21; purpose: verify current CometAPI documentation navigation.
- CometAPI chat completions reference - accessed 2026-06-21; purpose: verify chat completion contract areas.
- CometAPI responses reference - accessed 2026-06-21; purpose: verify responses endpoint contract areas.
- Google SRE overload guidance - accessed 2026-06-21; purpose: verify overload and reliability risk context.
Contract details to verify
| Area | What to verify | Source URL | Accessed | Safe candidate wording |
|---|---|---|---|---|
| Endpoint family | Whether the test should use Chat Completions or Responses for the selected model and workload. | https://apidoc.cometapi.com/api/text/chat and https://apidoc.cometapi.com/api/text/responses | 2026-06-21 | “Choose the endpoint family from the current CometAPI reference before sending the smoke test.” |
| Authentication | Whether the current request requires bearer-token authentication and how the key is supplied by the client. | https://apidoc.cometapi.com/api/text/chat | 2026-06-21 | “Send the request with a bearer token loaded from a safe secret source.” |
| Request body | Which request fields are required for the chosen endpoint family. | https://apidoc.cometapi.com/api/text/chat and https://apidoc.cometapi.com/api/text/responses | 2026-06-21 | “Use a minimal request body that matches the current endpoint reference.” |
| Response evidence | Which response fields can be recorded without storing full output text. | https://apidoc.cometapi.com/api/text/chat | 2026-06-21 | “Record structural response evidence such as status class, response identifier presence, object type, and usage-field presence when returned.” |
| Retry safety | Whether retries might amplify overload and should be limited in the runbook. | https://sre.google/sre-book/handling-overload/ | 2026-06-21 | “Retries need explicit limits because retry traffic can increase pressure during overload.” |
Failure modes
- Evidence gap: the team cannot inspect the request log, source page, incident note, or command output that supports the fallback decision. The safe action is to stop and record the missing evidence instead of guessing.
- Scope drift: the repair changes files, settings, models, or routing rules that are not connected to the observed failure. Keep the change tied to the failing signal and leave unrelated cleanup for a separate review.
- Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
- Unreviewed fallback: the team changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not proof that the topic is invalid.
- Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.
- Retry expansion: a failed request triggers multiple clients to retry without a shared budget. That can increase load at the exact moment the service is least able to absorb it.
Reader next step
Copy the sanitized log template into your fallback runbook and run one controlled happy-path check plus one controlled error-path check in a non-production environment. Link the result to your incident or change ticket, then compare it with Fallback Decision Logs for CometAPI Gateway Calls before approving any broader routing change. If the evidence is incomplete, keep the fallback route unchanged and collect the missing contract detail first.
FAQ
Should this checklist include a real API key?
No. Store credentials in an approved secret source and use <API_KEY_PLACEHOLDER> in examples, tickets, and runbook notes.
Can one smoke test prove CometAPI is healthy for production fallback?
No. A smoke test only confirms that a controlled request produced evidence at one point in time. Production readiness also needs monitoring, alerting, rollback criteria, and account-specific checks.
Which CometAPI endpoint should the runbook use?
Choose the endpoint family from the current CometAPI documentation for the selected workload and model. Do not assume that a previous runbook still matches the current contract.
Should failed smoke tests be retried automatically?
Not by default. Record the failure, classify it, and use a bounded retry rule only when the failure type and overload risk justify it.
What evidence is enough before promoting fallback traffic?
At minimum, keep the endpoint choice, placeholder model reference, status class, response structure fields, retry decision, fallback decision, and source links. That evidence should be enough for another engineer to understand the decision without seeing credentials, customer prompts, or full responses.