Source pack

  • CometAPI documentation home: CometAPI API documentation — use this to verify base URL, authentication conventions, and current API documentation structure before wiring a runbook into production.
  • CometAPI chat API documentation: CometAPI text chat API documentation — use this as the primary source for chat request shape, response shape, and endpoint path verification.
  • CometAPI support documentation: CometAPI help center — use this to verify escalation, account, billing, or support paths that are relevant during an incident.
  • Reliability practice reference: Google SRE book chapter on handling overload — use this for overload-control principles: capacity protection, load shedding, client-side throttling, and avoiding retry amplification.

Intent brief

Operators searching for overload runbook signals for LLM API failover are not looking for a generic “retry on error” checklist. They need a decision framework that separates transient single-call failures from provider overload, local client saturation, quota pressure, and downstream dependency problems.

This draft is written for engineers maintaining production LLM traffic routers, API gateways, worker queues, or incident runbooks. It focuses on when to fail over, when to throttle, and what to verify in CometAPI’s contract before automating those decisions.

For more operations notes in this site, use the posts index and the reliability article archive.

Overload Signals for LLM API Failover Runbooks

Last reviewed: 2026-06-12

Who this is for: SREs, platform engineers, and application owners who route LLM chat requests and need practical overload signals for failover without turning every timeout into an unnecessary provider switch.

Key takeaways

  • Treat failover as one overload-control action, not the first response to every error.
  • Separate provider overload, client-side saturation, quota or billing limits, and bad-request failures before routing traffic elsewhere.
  • Use the CometAPI text chat API documentation to verify endpoint paths, request fields, response fields, and error behavior before coding automated failover.
  • Follow the overload principle described in Google’s SRE material: protect serving capacity, shed or throttle lower-priority work, and avoid retry behavior that amplifies the overload condition.
  • Build runbook signals around request class, error type, latency budget, retry budget, and queue health rather than a single global “provider up/down” flag.

Concise definition

An overload failover signal is an observable condition showing that the current LLM serving path is unlikely to meet its service objective for a specific request class unless traffic is reduced, degraded, or routed to another validated path.

Good overload signals are:

  • Classified: they distinguish overload from auth errors, invalid payloads, and local network failures.
  • Bounded: they use retry budgets and cooldown windows so failover does not oscillate.
  • Actionable: each signal maps to a specific runbook action: wait, retry, throttle, degrade, queue, or fail over.
  • Verified against the API contract: endpoint, request, response, and error semantics are checked against the current CometAPI docs before automation.

The overload decision model

Google’s SRE guidance on overload emphasizes keeping systems within capacity, shedding excess work, and using throttling or load-shedding mechanisms rather than letting uncontrolled demand collapse the service path (Google SRE: handling overload). For LLM API routing, that translates into this decision order:

  1. Classify the request. Is it interactive, background, batch, retry, or user-blocking?
  2. Classify the failure. Is the evidence overload-like, contract-like, auth-like, quota-like, or local infrastructure-like?
  3. Protect the hot path. Stop retry storms before adding more load.
  4. Apply the smallest safe action. Prefer local backoff or degradation before full failover if the user experience permits.
  5. Fail over only to a validated target. The alternate model, endpoint, and response parser must already be tested.

This matters because overload failover can make incidents worse. If every worker retries immediately and then fails over to another constrained path, you can create a second overload event.

Runbook signal matrix

Use this table as a runbook starting point. Replace the placeholder thresholds with values tuned to your latency objective, traffic profile, and contractual API behavior.

SignalWhat it may indicateOperator actionValidation step
Repeated overload-like status codes or documented throttling responsesThe serving path may be rejecting excess load, rate-limited traffic, or unavailable requestsEnter backoff; reduce concurrency; fail over only for protected request classesVerify exact status codes, response body fields, and retry guidance in the CometAPI chat API documentation and support docs
Latency exceeds <P95_LATENCY_BUDGET_MS> while local queue depth is normalProvider-side or network-side latency pressure may be risingRoute only latency-sensitive classes to fallback; keep batch traffic queuedCompare client timing, gateway timing, and provider response timing if available
Local queue depth or worker saturation rises before API errorsYour own caller may be overloaded, not the upstream APIThrottle intake; pause non-critical jobs; do not immediately fail overCheck worker CPU, event-loop lag, connection pool exhaustion, and retry volume
Retry volume exceeds <RETRY_BUDGET>Retries may be amplifying overloadStop automatic retries for low-priority classes; use jittered backoffConfirm retry attempts per original request and per request class
Documented quota, account, or billing-related errors appearThe issue may not be transient overloadStop failover automation until contract and account status are verifiedCheck the CometAPI help center for support and account-verification paths
Fallback path success rate falls below <FALLBACK_HEALTH_THRESHOLD>Failover target may also be impairedDegrade response quality, queue work, or serve cached/non-LLM fallbackRun a small canary against fallback before shifting more traffic
Bad-request or schema errors increase after deploymentCaller release likely introduced invalid payloadsRoll back caller change; do not fail overDiff request payloads and validate against the CometAPI text chat API documentation

What should trigger failover?

A practical failover trigger should combine multiple signals. A single timeout is rarely enough.

Use a composite rule such as:

  • the request belongs to a failover-eligible class;
  • the primary path has overload-like evidence for <WINDOW>;
  • the caller has not exhausted its retry budget;
  • the fallback path passed a recent health check;
  • the error is not clearly caused by bad request payload, authentication, account state, or local saturation.

In other words: fail over when the current path is probably capacity-impaired and the alternate path is already known to be safer for that request class.

What should not trigger failover?

Avoid automatic failover for these cases unless your contract and tests explicitly support it:

  • malformed request payloads;
  • authentication header mistakes;
  • unsupported or unvalidated model identifiers;
  • client-side connection pool exhaustion;
  • local worker queue saturation;
  • billing, account, or quota conditions that require operator review;
  • parser failures caused by a response-shape assumption you have not verified.

The CometAPI docs should be treated as the contract source for endpoint and payload behavior. Verify the current chat API contract in the CometAPI text chat API documentation before deciding that a response is safe to classify as overload.

Contract details to verify

Do not hard-code these values from memory. Verify them against the linked source before implementing automation.

Contract itemValue to verify before production useWhy it matters in overload runbooksPrimary source
Endpoint pathsVerify the current base URL and exact chat endpoint path from the docs; use <COMETAPI_BASE_URL_FROM_DOCS> and <COMETAPI_CHAT_PATH_FROM_DOCS> until confirmedA wrong path can look like an upstream outage but is actually a client integration defectCometAPI API documentation and CometAPI text chat API documentation
Auth headersVerify the required auth header name, value format, and key-management guidance; keep <AUTH_HEADER_FROM_DOCS> as a placeholder until confirmedAuth failures should not trigger failover; they should trigger credential or deployment rollback investigationCometAPI API documentation
Request fieldsVerify required chat request fields, optional fields, model identifier format, and token-control fields from the chat docsInvalid request fields can create false “provider failure” signalsCometAPI text chat API documentation
Response fieldsVerify the response body shape, success fields, usage fields if documented, and any finish or termination indicatorsRouters need stable parsing to distinguish successful degraded responses from failed callsCometAPI text chat API documentation
Error behaviorVerify documented status codes, error body fields, throttling behavior, retry guidance, and support escalation pathFailover logic depends on separating overload-like errors from auth, bad request, and account issuesCometAPI text chat API documentation and CometAPI help center
Rate-limit or billing assumptionsVerify whether rate limits, quota behavior, billing counters, and account restrictions are documented for your plan or accountDo not assume a throttling response means provider overload; it may be account-specific policyCometAPI help center
Retry and load-shedding policyVerify your own retry budget, backoff, and request-class priority rules; tune thresholds rather than copying fixed numbersRetry storms can amplify overload and spread it to fallback pathsGoogle SRE: handling overload

Example: sanitized overload probe

This example is intentionally contract-neutral. Replace placeholders only after checking the current CometAPI docs. Do not use this as a production health check until the endpoint, auth header, fields, and validated model ID are confirmed.

curl --silent --show-error --fail-with-body \
  --request POST "<COMETAPI_BASE_URL_FROM_DOCS><COMETAPI_CHAT_PATH_FROM_DOCS>" \
  --header "<AUTH_HEADER_FROM_DOCS>: <COMETAPI_API_KEY>" \
  --header "<CONTENT_TYPE_HEADER_FROM_DOCS>: <JSON_CONTENT_TYPE_FROM_DOCS>" \
  --data '{
    "<MODEL_FIELD_FROM_DOCS>": "<VALIDATED_MODEL_ID>",
    "<MESSAGES_FIELD_FROM_DOCS>": [
      {
        "<ROLE_FIELD_FROM_DOCS>": "<USER_ROLE_VALUE_FROM_DOCS>",
        "<CONTENT_FIELD_FROM_DOCS>": "Return the single word ok."
      }
    ],
    "<TOKEN_LIMIT_FIELD_FROM_DOCS>": 4
  }'

Use this type of probe for classification, not for load testing. The goal is to confirm that the route, credentials, parser, and fallback target are healthy enough to receive a small amount of failover traffic.

Practical validation steps

1. Build an error taxonomy

Create explicit labels before writing router logic:

  • success
  • timeout
  • local_saturation
  • bad_request
  • auth_or_permission
  • quota_or_billing_review
  • overload_like
  • unknown_upstream_error
  • fallback_unhealthy

Map each label to one action. If a label can mean several things, keep it out of the automatic failover path until you have better evidence.

2. Record request class on every call

At minimum, tag each request with:

  • traffic class: interactive, background, batch, evaluation, retry;
  • user-visible priority;
  • original request ID;
  • retry attempt number;
  • primary route and fallback route;
  • caller service;
  • timeout budget.

Failover for an interactive user request may be appropriate while batch traffic should simply wait.

3. Validate the primary path

For each overload-like event, collect:

  • status code;
  • response body error field, if documented;
  • retry guidance, if documented;
  • total latency;
  • time to first byte, if available;
  • client-side timeout reason;
  • connection reuse or connection-pool metrics;
  • request payload size;
  • model identifier used;
  • request class.

Compare the result against the current CometAPI text chat API documentation. If the evidence points to a malformed request or invalid model identifier, the runbook should stop failover and send the incident to the owning service team.

4. Validate the fallback path before shifting traffic

A fallback is not safe just because it is different. Before routing production traffic, verify:

  • the fallback model or route accepts the same request class;
  • the response parser can handle the fallback response;
  • safety, formatting, and latency expectations are acceptable for the degraded mode;
  • the fallback route has not recently failed its own health checks;
  • the fallback route has a separate retry budget.

5. Use progressive failover

Avoid all-or-nothing switches. A safer sequence is:

  1. stop retries for low-priority traffic;
  2. throttle non-critical background jobs;
  3. fail over a small protected class;
  4. watch fallback error rate and latency;
  5. expand only if fallback remains healthy;
  6. roll back when the primary path remains stable for a defined cooldown window.

The exact percentages and cooldown windows should be tuned in your environment. They are not universal reliability constants.

Suggested runbook actions by severity

SeverityConditionsRecommended action
WatchOne or more isolated overload-like events, no queue growth, fallback healthyLog classification; no broad failover
ContainError burst for one request class; latency budget at risk; retries increasingApply jittered backoff; reduce concurrency; stop low-priority retries
DegradePrimary path impaired for user-visible traffic; fallback recently validatedFail over protected interactive class; queue or defer batch traffic
IncidentPrimary and fallback impaired, or account/auth/quota status unclearFreeze automation, page owner, use support path, serve non-LLM fallback where possible

Operator checks during an incident

Ask these questions before pressing the failover button:

  1. Is the evidence overload-like or contract-like?
    Bad payloads, unsupported fields, or auth problems should not route to another provider automatically.

  2. Are retries making the situation worse?
    The SRE overload guidance warns against load behavior that worsens overload. Cap retries and use jittered backoff.

  3. Is the fallback truly healthy?
    If the fallback path is untested or currently failing, failover may spread the incident.

  4. Can lower-priority work wait?
    Queueing batch traffic is often safer than consuming fallback capacity needed for interactive users.

  5. Do you need support or account review?
    For account, billing, quota, or support-path questions, verify the current route through the CometAPI help center.

FAQ

Should every 429 or 503 trigger failover?

No. First verify what the API contract says about the status code and error body. A throttling or overload-like response may call for backoff, reduced concurrency, or request-class shedding before failover. Use the documented behavior from the API docs rather than assuming every status code has the same meaning across providers.

Should timeouts trigger immediate fallback?

Usually not by themselves. A timeout can be caused by local worker saturation, DNS, connection pools, network path issues, request size, or upstream latency. Combine timeout evidence with queue health, retry counts, fallback health, and request class.

How do we avoid retry storms?

Set a retry budget per original request, use jittered backoff, and stop retries for low-priority classes during overload. Make sure a failover attempt counts against the same budget or a stricter fallback budget.

What is the safest first action during overload?

For many systems, the safest first action is to reduce demand: throttle intake, pause background work, and disable unnecessary retries. Failover is safer after you know the fallback path is healthy and the current failure is not caused by your own request contract.

Can one synthetic probe decide provider health?

No. A probe can confirm basic route health, auth, and parser behavior. It should not be the only overload signal. Combine probe results with real request metrics, error classification, latency, and retry pressure.

Where should CometAPI-specific details live?

Keep endpoint paths, auth header names, request fields, response parsing, and error mappings in a versioned integration contract. Review that contract against the CometAPI documentation before changing automated failover behavior.

Sources checked

Access date: 2026-06-12.

SourcePurpose
CometAPI API documentationVerify current API documentation entry point, base URL conventions, and auth-related contract details before implementation
CometAPI text chat API documentationVerify chat endpoint path, request schema, response schema, and error behavior for chat completions
CometAPI help centerVerify support, account, billing, or escalation paths relevant to overload incidents
Google SRE book: handling overloadGround overload runbook design in established SRE practices for load shedding, throttling, and capacity protection

Next step

If you are evaluating CometAPI as part of an LLM routing or fallback design, start by validating the current API contract in the docs, then run a small staging drill with explicit overload classifications before enabling automatic production failover.

Start with CometAPI