Model Change Runbook for LLM Reliability Owners

Last reviewed: 2026-06-25

Direct answer

A model change runbook should prove three things before traffic shifts: the candidate model is discoverable in the current catalog, the request and response contract still matches the endpoint your gateway calls, and HTTP telemetry can separate healthy responses from error and retry paths. Keep the test narrow. Verify a small happy-path request, one controlled failure path, and the exact log fields your reliability review needs before any routing rule moves beyond a limited trial.

This runbook is intentionally conservative. It does not decide whether a model is better, cheaper, faster, or available for every account. It answers a narrower reliability question: can the team identify the candidate model, call the intended endpoint shape, observe the result, and explain what happened if the request fails? If the answer is no, traffic should stay held until the missing evidence is available.

Setup assumptions:

The operator has a test API key stored outside the runbook as <API_KEY_PLACEHOLDER>.
The candidate model is selected from the current model catalog, not copied from an old ticket.
The test client can point at the documented CometAPI base URL and endpoint family being exercised.
The service already records low-cardinality HTTP attributes for outbound API calls.
The operator has a rollback owner, a change ticket, and a place to store sanitized pass/fail notes.

Happy-path request plan:

Open the current model catalog and confirm the candidate model record that will be used by the smoke test.
Copy only the model identifier needed by the test client. Do not copy pricing, quota, or availability assumptions into the runbook unless those are verified separately.
Send one minimal request to the documented responses or chat endpoint with a sanitized prompt such as Return the word ok.
Record whether the HTTP status, response object type, and expected text field are present.
Compare the response shape with the endpoint reference before promoting any fallback, router, or gateway rule.
Link the result to the change ticket so a later incident review can reconstruct which model, endpoint family, and client version were tested.

Error-path check:

Run one controlled invalid request, such as a request with a deliberately invalid placeholder model value. The purpose is not to force a provider outage. The purpose is to confirm that the client records status, error category, retry decision, and final outcome without exposing credentials, customer prompts, or full response bodies. The failure-path check should finish quickly and should not trigger an unbounded retry loop.

Minimum assertions:

The catalog check returned the candidate model record used by the test.
The endpoint accepted the intended request shape.
The response contained the documented top-level fields needed by the caller.
HTTP telemetry captured method, route or target, status code, and error signal without high-cardinality payload data.
Retry or fallback code did not loop indefinitely after the controlled failure.
The run record is clear enough for another reliability owner to repeat the same check later.

Pass/fail logging fields:

run_id: "model-change-smoke-YYYYMMDD-001"
operator: "on-call-placeholder"
model_candidate: "<MODEL_ID>"
endpoint_family: "responses-or-chat"
happy_path_status: "pass|fail"
error_path_status: "pass|fail"
http_status_observed: "<STATUS_CODE>"
response_shape_checked: "yes|no"
telemetry_fields_present: "yes|no"
fallback_decision: "promote|hold|rollback"
notes: "sanitized summary only"

What not to assert: do not claim long-term availability, stable pricing, account-specific quotas, rate behavior, latency targets, or future model behavior from a single smoke test. Treat those as separate operational reviews. A smoke test can show that a route is callable now and observable enough for a controlled change; it cannot prove that the model will behave the same way under production volume or future provider changes.

For neighboring checks, pair this runbook with How to Use Model Change Evidence for LLM API Reliability Checks and Review HTTP Telemetry Before Trusting LLM API Failover . Those guides help separate model evidence from telemetry evidence so a routing decision is not based on a single passing request.

Who this is for

This guide is for reliability owners, platform engineers, and on-call leads who approve model routing changes for LLM API gateways. It fits teams that already have a gateway, fallback path, or model-selection layer and need a repeatable check before changing production traffic.

It is also useful when the application team and platform team share responsibility. The application team may own prompt quality and product behavior, while the platform team owns endpoint configuration, retry behavior, telemetry, and rollback. This runbook gives both groups a short checklist that avoids mixing product evaluation with transport-level reliability checks.

Use it before a planned model change, after a provider catalog update, when adding a new fallback candidate, or when moving a low-risk workload onto a different endpoint family. Do not use it as the only approval step for sensitive workloads, regulated data, customer-facing launch decisions, or broad cost changes.

Key takeaways

Treat the current model catalog as the source for candidate model identity and routing metadata.
Verify the endpoint contract with a minimal request before running broader experiments.
Keep telemetry assertions focused on HTTP method, status, error classification, and retry outcome.
Use a controlled failure path to confirm fallback behavior is observable.
Avoid claiming production readiness from a smoke test alone.
Record what was tested, what was held back, and who owns the next decision.

Failure modes

Missing catalog evidence: the candidate model name comes from an old ticket, dashboard screenshot, or copied configuration. The safe action is to re-check the current catalog before running the request.
Endpoint mismatch: the gateway calls one endpoint family while the test uses another. Keep response-style and chat-style checks separate, and compare each one against the relevant reference page.
Telemetry gap: the request succeeds, but logs cannot show method, target, status, error signal, retry decision, or fallback outcome. A passing request without reviewable telemetry is not enough for a reliability owner.
Retry amplification: the failure-path test triggers repeated retries without a clear budget or terminal decision. Hold the change until retry behavior can be bounded and reviewed.
Scope drift: the repair expands into unrelated model tuning, prompt rewriting, or cost review. Keep this runbook tied to catalog identity, endpoint contract, telemetry, and rollback readiness.
Environment mismatch: the local check uses different credentials, feature flags, client versions, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
Weak handoff: the final note says the check passed but omits the request family, model candidate, observed status, telemetry fields, and remaining uncertainty. That makes the next owner repeat the investigation.

Reader next step

Before approving a model change, create a short change record with four sections: catalog evidence, endpoint evidence, telemetry evidence, and rollback decision. Paste the sanitized pass/fail logging fields from this runbook into that record, then attach links to the relevant model catalog page, endpoint reference, and dashboard or log query used for HTTP telemetry.

If any section is blank, hold the routing change. The next action should be specific: verify the candidate model in the catalog, re-run the endpoint smoke test against the correct endpoint family, add the missing HTTP fields to outbound telemetry, or assign a rollback owner. If all four sections are complete, proceed only with the smallest traffic movement your team normally allows for model changes, and keep the rollback path ready until the first production review window closes.

For a deeper fallback handoff, use Build a CometAPI Fallback Evidence Checklist after this preflight check. For request logging details, use Log Fields That Make CometAPI Retries Reviewable .

Use CometAPI chat reliability contract review as the next comparison point.

Sources checked

CometAPI documentation - accessed 2026-06-25; purpose: verify current CometAPI documentation navigation.
CometAPI models overview - accessed 2026-06-25; purpose: verify model catalog discovery guidance.
CometAPI responses reference - accessed 2026-06-25; purpose: verify responses endpoint contract areas.
OpenTelemetry HTTP semantic conventions - accessed 2026-06-25; purpose: verify HTTP telemetry field context.

Contract details to verify

Area	What to verify	Source URL	Accessed	Safe candidate wording
Response endpoint	Confirm the request shape and top-level response fields for the endpoint family under test.	https://apidoc.cometapi.com/api/text/responses	2026-06-25	“Check the documented response fields before promoting the route.”
Chat endpoint	Confirm chat-style callers still match the documented chat completion contract.	https://apidoc.cometapi.com/api/text/chat	2026-06-25	“Use the chat reference when the gateway calls the chat-completions endpoint family.”
HTTP telemetry	Confirm outbound HTTP calls record method, status, and error information with stable attribute values.	https://opentelemetry.io/docs/specs/semconv/http/	2026-06-25	“Record low-cardinality HTTP telemetry for both happy-path and failure-path checks.”
Documentation discovery	Confirm the docs page used by the runbook is still part of the current public documentation map.	https://apidoc.cometapi.com/	2026-06-25	“Re-check the current docs before copying endpoint details into an operations ticket.”

FAQ

Should this runbook approve a model change by itself?

No. It is a preflight and smoke-test guide. Production approval should also consider application-specific quality checks, rollback ownership, customer impact, and business risk.

Can the smoke test use a real customer prompt?

No. Use a sanitized placeholder prompt and record only the fields needed for reliability review. Do not store customer prompts, full responses, credentials, or account-specific commercial details in the run record.

Should the runbook assert pricing, quotas, or long-term availability?

No. Those checks are account-specific or time-sensitive and should be verified through the appropriate commercial and operational sources. This runbook only covers catalog identity, endpoint contract, telemetry, and controlled fallback behavior.

What is the safest failure-path test?

Use a controlled invalid request that cannot leak credentials or customer data, then confirm the client records the status, error category, retry decision, and final fallback outcome. Stop if the test creates repeated retries or unclear side effects.

When should traffic stay held?

Hold traffic if the model cannot be confirmed in the current catalog, the endpoint contract does not match the caller, telemetry is missing, the rollback owner is unclear, or the failure path cannot be reviewed from logs.