Topic archive

On-Call

CometAPI Timeout Evidence Pack for On-call Reviews

A structured guide to assembling and using timeout evidence when reviewing CometAPI incidents on call — covering which HTTP telemetry fields to capture, how to apply retry-with-backoff safely, and what to hand off to support.

cometapitimeouton-callincident-review
Read

Overload Signal Triage for LLM API On-Call Engineers

A practical guide for on-call engineers who need to distinguish real LLM API overload from transient noise, decide when to retry, and know when to escalate or shed load.

llm-api-reliabilityoverload-triageon-callobservability
Read