lastplayed.io · Incident report
Status / June 21, 2026
SEV-1Resolved
lastplayed.io 503 outage — image registry unreachable during a node fault
A fault on a single cluster node simultaneously killed lastplayed.io's app pods and the private image registry they needed to restart; the site returned 503 for ~43 minutes until it was served from a cached image. The user-facing outage is over; a durable node fix is still pending.
Affected components
Timeline
started15:44 UTC
Both app pods were killed by a health-check timeout and could not re-pull their image. Outage begins — no automated alert fired.
investigating16:06 UTC
503s confirmed at the edge with no healthy backends behind them; investigation began.
identified16:18 UTC
Root cause: an overlay-network fault on a single cluster node simultaneously killed the app pods and downed the private registry they needed to restart.
resolved16:27 UTC
lastplayed.io restored, served from a cached image; both replicas healthy, HTTP 200.
monitoring16:45 UTC
Sign-in (SSO) restored with the same cached-image stopgap; deploy tooling unblocked. The underlying node is still fragile — watching.
Read the full RCA (PDF) →← All incidents · June 21, 2026