Source pack
- CometAPI documentation home: CometAPI API documentation — use this to verify base URL, authentication conventions, and current API documentation structure before wiring a runbook into production.
- CometAPI chat API documentation: CometAPI text chat API documentation — use this as the primary source for chat request shape, response shape, and endpoint path verification.
- CometAPI support documentation: CometAPI help center — use this to verify escalation, account, billing, or support paths that are relevant during an incident.
- Reliability practice reference: Google SRE book chapter on handling overload — use this for overload-control principles: capacity protection, load shedding, client-side throttling, and avoiding retry amplification.
Intent brief
Operators searching for overload runbook signals for LLM API failover are not looking for a generic “retry on error” checklist. They need a decision framework that separates transient single-call failures from provider overload, local client saturation, quota pressure, and downstream dependency problems.
This draft is written for engineers maintaining production LLM traffic routers, API gateways, worker queues, or incident runbooks. It focuses on when to fail over, when to throttle, and what to verify in CometAPI’s contract before automating those decisions.
For more operations notes in this site, use the posts index and the reliability article archive.
Overload Signals for LLM API Failover Runbooks
Last reviewed: 2026-06-12
Who this is for: SREs, platform engineers, and application owners who route LLM chat requests and need practical overload signals for failover without turning every timeout into an unnecessary provider switch.
Key takeaways
- Treat failover as one overload-control action, not the first response to every error.
- Separate provider overload, client-side saturation, quota or billing limits, and bad-request failures before routing traffic elsewhere.
- Use the CometAPI text chat API documentation to verify endpoint paths, request fields, response fields, and error behavior before coding automated failover.
- Follow the overload principle described in Google’s SRE material: protect serving capacity, shed or throttle lower-priority work, and avoid retry behavior that amplifies the overload condition.
- Build runbook signals around request class, error type, latency budget, retry budget, and queue health rather than a single global “provider up/down” flag.
Concise definition
An overload failover signal is an observable condition showing that the current LLM serving path is unlikely to meet its service objective for a specific request class unless traffic is reduced, degraded, or routed to another validated path.
Good overload signals are:
- Classified: they distinguish overload from auth errors, invalid payloads, and local network failures.
- Bounded: they use retry budgets and cooldown windows so failover does not oscillate.
- Actionable: each signal maps to a specific runbook action: wait, retry, throttle, degrade, queue, or fail over.
- Verified against the API contract: endpoint, request, response, and error semantics are checked against the current CometAPI docs before automation.
The overload decision model
Google’s SRE guidance on overload emphasizes keeping systems within capacity, shedding excess work, and using throttling or load-shedding mechanisms rather than letting uncontrolled demand collapse the service path (Google SRE: handling overload). For LLM API routing, that translates into this decision order:
- Classify the request. Is it interactive, background, batch, retry, or user-blocking?
- Classify the failure. Is the evidence overload-like, contract-like, auth-like, quota-like, or local infrastructure-like?
- Protect the hot path. Stop retry storms before adding more load.
- Apply the smallest safe action. Prefer local backoff or degradation before full failover if the user experience permits.
- Fail over only to a validated target. The alternate model, endpoint, and response parser must already be tested.
This matters because overload failover can make incidents worse. If every worker retries immediately and then fails over to another constrained path, you can create a second overload event.
Runbook signal matrix
Use this table as a runbook starting point. Replace the placeholder thresholds with values tuned to your latency objective, traffic profile, and contractual API behavior.
| Signal | What it may indicate | Operator action | Validation step |
|---|---|---|---|
| Repeated overload-like status codes or documented throttling responses | The serving path may be rejecting excess load, rate-limited traffic, or unavailable requests | Enter backoff; reduce concurrency; fail over only for protected request classes | Verify exact status codes, response body fields, and retry guidance in the CometAPI chat API documentation and support docs |
Latency exceeds <P95_LATENCY_BUDGET_MS> while local queue depth is normal | Provider-side or network-side latency pressure may be rising | Route only latency-sensitive classes to fallback; keep batch traffic queued | Compare client timing, gateway timing, and provider response timing if available |
| Local queue depth or worker saturation rises before API errors | Your own caller may be overloaded, not the upstream API | Throttle intake; pause non-critical jobs; do not immediately fail over | Check worker CPU, event-loop lag, connection pool exhaustion, and retry volume |
Retry volume exceeds <RETRY_BUDGET> | Retries may be amplifying overload | Stop automatic retries for low-priority classes; use jittered backoff | Confirm retry attempts per original request and per request class |
| Documented quota, account, or billing-related errors appear | The issue may not be transient overload | Stop failover automation until contract and account status are verified | Check the CometAPI help center for support and account-verification paths |
Fallback path success rate falls below <FALLBACK_HEALTH_THRESHOLD> | Failover target may also be impaired | Degrade response quality, queue work, or serve cached/non-LLM fallback | Run a small canary against fallback before shifting more traffic |
| Bad-request or schema errors increase after deployment | Caller release likely introduced invalid payloads | Roll back caller change; do not fail over | Diff request payloads and validate against the CometAPI text chat API documentation |
What should trigger failover?
A practical failover trigger should combine multiple signals. A single timeout is rarely enough.
Use a composite rule such as:
- the request belongs to a failover-eligible class;
- the primary path has overload-like evidence for
<WINDOW>; - the caller has not exhausted its retry budget;
- the fallback path passed a recent health check;
- the error is not clearly caused by bad request payload, authentication, account state, or local saturation.
In other words: fail over when the current path is probably capacity-impaired and the alternate path is already known to be safer for that request class.
What should not trigger failover?
Avoid automatic failover for these cases unless your contract and tests explicitly support it:
- malformed request payloads;
- authentication header mistakes;
- unsupported or unvalidated model identifiers;
- client-side connection pool exhaustion;
- local worker queue saturation;
- billing, account, or quota conditions that require operator review;
- parser failures caused by a response-shape assumption you have not verified.
The CometAPI docs should be treated as the contract source for endpoint and payload behavior. Verify the current chat API contract in the CometAPI text chat API documentation before deciding that a response is safe to classify as overload.
Contract details to verify
Do not hard-code these values from memory. Verify them against the linked source before implementing automation.
| Contract item | Value to verify before production use | Why it matters in overload runbooks | Primary source |
|---|---|---|---|
| Endpoint paths | Verify the current base URL and exact chat endpoint path from the docs; use <COMETAPI_BASE_URL_FROM_DOCS> and <COMETAPI_CHAT_PATH_FROM_DOCS> until confirmed | A wrong path can look like an upstream outage but is actually a client integration defect | CometAPI API documentation and CometAPI text chat API documentation |
| Auth headers | Verify the required auth header name, value format, and key-management guidance; keep <AUTH_HEADER_FROM_DOCS> as a placeholder until confirmed | Auth failures should not trigger failover; they should trigger credential or deployment rollback investigation | CometAPI API documentation |
| Request fields | Verify required chat request fields, optional fields, model identifier format, and token-control fields from the chat docs | Invalid request fields can create false “provider failure” signals | CometAPI text chat API documentation |
| Response fields | Verify the response body shape, success fields, usage fields if documented, and any finish or termination indicators | Routers need stable parsing to distinguish successful degraded responses from failed calls | CometAPI text chat API documentation |
| Error behavior | Verify documented status codes, error body fields, throttling behavior, retry guidance, and support escalation path | Failover logic depends on separating overload-like errors from auth, bad request, and account issues | CometAPI text chat API documentation and CometAPI help center |
| Rate-limit or billing assumptions | Verify whether rate limits, quota behavior, billing counters, and account restrictions are documented for your plan or account | Do not assume a throttling response means provider overload; it may be account-specific policy | CometAPI help center |
| Retry and load-shedding policy | Verify your own retry budget, backoff, and request-class priority rules; tune thresholds rather than copying fixed numbers | Retry storms can amplify overload and spread it to fallback paths | Google SRE: handling overload |
Example: sanitized overload probe
This example is intentionally contract-neutral. Replace placeholders only after checking the current CometAPI docs. Do not use this as a production health check until the endpoint, auth header, fields, and validated model ID are confirmed.
curl --silent --show-error --fail-with-body \
--request POST "<COMETAPI_BASE_URL_FROM_DOCS><COMETAPI_CHAT_PATH_FROM_DOCS>" \
--header "<AUTH_HEADER_FROM_DOCS>: <COMETAPI_API_KEY>" \
--header "<CONTENT_TYPE_HEADER_FROM_DOCS>: <JSON_CONTENT_TYPE_FROM_DOCS>" \
--data '{
"<MODEL_FIELD_FROM_DOCS>": "<VALIDATED_MODEL_ID>",
"<MESSAGES_FIELD_FROM_DOCS>": [
{
"<ROLE_FIELD_FROM_DOCS>": "<USER_ROLE_VALUE_FROM_DOCS>",
"<CONTENT_FIELD_FROM_DOCS>": "Return the single word ok."
}
],
"<TOKEN_LIMIT_FIELD_FROM_DOCS>": 4
}'
Use this type of probe for classification, not for load testing. The goal is to confirm that the route, credentials, parser, and fallback target are healthy enough to receive a small amount of failover traffic.
Practical validation steps
1. Build an error taxonomy
Create explicit labels before writing router logic:
successtimeoutlocal_saturationbad_requestauth_or_permissionquota_or_billing_reviewoverload_likeunknown_upstream_errorfallback_unhealthy
Map each label to one action. If a label can mean several things, keep it out of the automatic failover path until you have better evidence.
2. Record request class on every call
At minimum, tag each request with:
- traffic class: interactive, background, batch, evaluation, retry;
- user-visible priority;
- original request ID;
- retry attempt number;
- primary route and fallback route;
- caller service;
- timeout budget.
Failover for an interactive user request may be appropriate while batch traffic should simply wait.
3. Validate the primary path
For each overload-like event, collect:
- status code;
- response body error field, if documented;
- retry guidance, if documented;
- total latency;
- time to first byte, if available;
- client-side timeout reason;
- connection reuse or connection-pool metrics;
- request payload size;
- model identifier used;
- request class.
Compare the result against the current CometAPI text chat API documentation. If the evidence points to a malformed request or invalid model identifier, the runbook should stop failover and send the incident to the owning service team.
4. Validate the fallback path before shifting traffic
A fallback is not safe just because it is different. Before routing production traffic, verify:
- the fallback model or route accepts the same request class;
- the response parser can handle the fallback response;
- safety, formatting, and latency expectations are acceptable for the degraded mode;
- the fallback route has not recently failed its own health checks;
- the fallback route has a separate retry budget.
5. Use progressive failover
Avoid all-or-nothing switches. A safer sequence is:
- stop retries for low-priority traffic;
- throttle non-critical background jobs;
- fail over a small protected class;
- watch fallback error rate and latency;
- expand only if fallback remains healthy;
- roll back when the primary path remains stable for a defined cooldown window.
The exact percentages and cooldown windows should be tuned in your environment. They are not universal reliability constants.
Suggested runbook actions by severity
| Severity | Conditions | Recommended action |
|---|---|---|
| Watch | One or more isolated overload-like events, no queue growth, fallback healthy | Log classification; no broad failover |
| Contain | Error burst for one request class; latency budget at risk; retries increasing | Apply jittered backoff; reduce concurrency; stop low-priority retries |
| Degrade | Primary path impaired for user-visible traffic; fallback recently validated | Fail over protected interactive class; queue or defer batch traffic |
| Incident | Primary and fallback impaired, or account/auth/quota status unclear | Freeze automation, page owner, use support path, serve non-LLM fallback where possible |
Operator checks during an incident
Ask these questions before pressing the failover button:
Is the evidence overload-like or contract-like?
Bad payloads, unsupported fields, or auth problems should not route to another provider automatically.Are retries making the situation worse?
The SRE overload guidance warns against load behavior that worsens overload. Cap retries and use jittered backoff.Is the fallback truly healthy?
If the fallback path is untested or currently failing, failover may spread the incident.Can lower-priority work wait?
Queueing batch traffic is often safer than consuming fallback capacity needed for interactive users.Do you need support or account review?
For account, billing, quota, or support-path questions, verify the current route through the CometAPI help center.
FAQ
Should every 429 or 503 trigger failover?
No. First verify what the API contract says about the status code and error body. A throttling or overload-like response may call for backoff, reduced concurrency, or request-class shedding before failover. Use the documented behavior from the API docs rather than assuming every status code has the same meaning across providers.
Should timeouts trigger immediate fallback?
Usually not by themselves. A timeout can be caused by local worker saturation, DNS, connection pools, network path issues, request size, or upstream latency. Combine timeout evidence with queue health, retry counts, fallback health, and request class.
How do we avoid retry storms?
Set a retry budget per original request, use jittered backoff, and stop retries for low-priority classes during overload. Make sure a failover attempt counts against the same budget or a stricter fallback budget.
What is the safest first action during overload?
For many systems, the safest first action is to reduce demand: throttle intake, pause background work, and disable unnecessary retries. Failover is safer after you know the fallback path is healthy and the current failure is not caused by your own request contract.
Can one synthetic probe decide provider health?
No. A probe can confirm basic route health, auth, and parser behavior. It should not be the only overload signal. Combine probe results with real request metrics, error classification, latency, and retry pressure.
Where should CometAPI-specific details live?
Keep endpoint paths, auth header names, request fields, response parsing, and error mappings in a versioned integration contract. Review that contract against the CometAPI documentation before changing automated failover behavior.
Sources checked
Access date: 2026-06-12.
| Source | Purpose |
|---|---|
| CometAPI API documentation | Verify current API documentation entry point, base URL conventions, and auth-related contract details before implementation |
| CometAPI text chat API documentation | Verify chat endpoint path, request schema, response schema, and error behavior for chat completions |
| CometAPI help center | Verify support, account, billing, or escalation paths relevant to overload incidents |
| Google SRE book: handling overload | Ground overload runbook design in established SRE practices for load shedding, throttling, and capacity protection |
Next step
If you are evaluating CometAPI as part of an LLM routing or fallback design, start by validating the current API contract in the docs, then run a small staging drill with explicit overload classifications before enabling automatic production failover.