[EAM] Service Disruption in Integrations and Inventory Export

Incident Report for SAP LeanIX

Postmortem

Summary

Between September 19, 01:46 PM UTC and 02:41 PM UTC, customers experienced issues with failing requests coming from out-of-the-box Integrations.

What happened?

While continuously improving our stability mechanisms and standards in our services, we updated our GraphQL API rate-limiting mechanism.

During the above stated timeframe, Integration runs failed, as our GraphQL API rejected their requests due to a bug in the changed code, that resulted in unwanted failing of the requests.

Mitigation: What did we do about it?

Our monitoring systems alerted the team immediately after the release was live on production, and calls from our Integration services were blocked. The team created and released a fix, which resolved the situation.

Follow-ups: How will we improve?

After the mitigation, we did an in-depth analysis of why our CI/CD pipeline did not fail after we implemented these changes, and added test cases which prevent breaking it again.

Even though our monitoring system detected the issue immediately, we are going to improve further on alerting and checking possibilities to recover faster from such a scenario.

Posted Sep 22, 2023 - 06:51 UTC

Resolved

This incident has been resolved.
Posted Sep 19, 2023 - 15:35 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Sep 19, 2023 - 14:39 UTC

Update

We identified, that only our integration services are affected. The fix is still in the ongoing.
Posted Sep 19, 2023 - 14:01 UTC

Identified

We are currently experiencing a service disruption in EAM. Our team is working on a fix.

We will send an additional update in 60 minutes.
Posted Sep 19, 2023 - 13:56 UTC
This incident affected: EU Instances (EAM), US Instances (EAM), CA Instances (EAM), AU Instances (EAM), DE Instances (EAM), CH Instances (EAM), AE Instances (EAM), and UK Instances (EAM).