[EAM] Service disruption in automations

Incident Report for SAP LeanIX

Postmortem

Summary

Between August 16, 11:47 UTC UTC and September 28, 12:55 PM UTC, customers noticed automations configured to run when a lifecycle of a fact sheet has changed, failed to run.

What happened?

As part of our ongoing efforts to improve the stability of our services, we introduced a bug in the code which did not handle all edge cases. As a consequence all automations configured to run when the lifecycle phase on a fact sheet changes ran only once for the whole workspace. All workspaces that had such automations configured were affected. The customers may have not been notified at the correct time upon a fact sheet lifecycle change.

Mitigation: What did we do about it?

The problem was fixed as soon as it was detected. All automations that needed to run were restarted.

Follow-ups: How will we improve?

After mitigation, we did a thorough analysis of why our tests did not detect the bug. We already did improvements for the investigation process and plan to implement additional specific test cases to cover these unforeseen scenarios.

Posted Sep 29, 2023 - 16:18 UTC

Resolved

We were experiencing a service disruption in automations. Our team implemented a solution and started the automations again.
Posted Sep 18, 2023 - 07:00 UTC