Service Disruption in Event Processing (false positive)
Incident Report for SAP LeanIX
Postmortem

Summary

On Wednesday, June 5th, from 7:30 UTC to 8:20 UTC, our health-check showed a downtime. But all systems were operational and no customer data was affected.

What happened?

  • [2024-06-05 07:25 UTC]: We released a broken background job which triggered our too sensitive alerting system, which showed a general downtime on monitoring.leanix.net.
  • [2024-06-05 08:25 UTC]: The fix has been rolled out, and the alert got resolved.

There was no outage of our systems, everything was working as expected. Only five workspaces, which had the new background job activated, might have delayed event processing for Fact Sheet changes. No data has been lost.

Mitigation: What did we do about it?

We released a new version of the affected service without the broken background job.

Follow-ups: How will we improve?

We adjusted the monitoring system to not show a general downtime, but rely on the monitoring and alerting for our background jobs.

We continuously invest into improvements of our mean time to recovery, to resolve such situations even faster in the future.

Posted Jun 06, 2024 - 12:27 UTC

Resolved
This incident has been resolved.
Posted Jun 05, 2024 - 07:30 UTC