Survey shows empty results

Incident Report for SAP LeanIX

Postmortem

Incident description

Multiple customers in the EU, US, DE, UK, CA, AU, and CH regions noticed duplicated survey runs due to a bug in the fact sheet scope change logic. This logic keeps all relevant fact sheets and subscribers attached to a dynamic survey run. The bug was deployed for several days, from Feb 7, 2025 07:38 UTC to Feb 18, 2025 08:18 UTC, before reverting to the previous version. There was no data loss, but email notifications about a changed survey scope were sent out as a result.

Incident resolution

Service

We reverted the changes that caused the issue. It was resolved at Feb 18, 2025 08:18 UTC.

Data Preservation

Since duplicated database records were introduced, we started addressing them. The mitigation plan was rolled out in several steps from Feb 18, 2025 08:18 UTC to Mar 7, 2025 10:59 UTC.

Root Cause Analysis

There were two issues at play:

  • Moving to the new fact sheet scope change
  • Prematurely applying performance improvements

New fact sheet scope change logic

We switched from the original implementation of the fact sheet scope change to the new implementation to solve performance issues. Once we switched, we failed to notice the irregular behavior of one edge case in the business logic. As a result, duplicate fact sheets were added to the survey scope for numerous poll runs.

Performance improvements

With the removal of the survey scope change bug, our service experienced a high load, which could potentially impact other services. We decided to quickly apply small adjustments to the same logic to improve performance. We did not consider the potential side effects of the change, and as a result, we had a duplicate fact sheet added to a survey run.

Preventive measure

Four things we will take out of this:

  • We will invest more in monitoring and alerting to catch anomalies that go against business logic, like allowing duplicated fact sheets within a survey run
  • We will improve the mitigation process to ensure a quicker response time
  • We will invest more into tests to cover edge cases
  • We will improve the assessment of rollouts to identify how impactful a change is and where potential issues can occur
Posted Mar 18, 2025 - 10:18 UTC

Resolved

This incident has been resolved. We appreciate your patience and understanding.
Posted Mar 05, 2025 - 10:12 UTC

Update

Most regions are fully operational again now. We're continuously working on restoring functionality in the remaining regions.
Posted Feb 28, 2025 - 08:51 UTC

Update

At the moment, changes in survey scope are not detected automatically and notification emails regarding such changes are not sent out. While we're working on restoring this functionality, please use the functionality to manually "Check for Changes" as described in the documentation: https://docs-eam.leanix.net/docs/managing-surveys-and-viewing-results#viewing-survey-results.
Posted Feb 27, 2025 - 11:48 UTC

Update

Duplicate survey runs have been cleaned up in most workspaces. The team is working on finalizing the cleanup in the remaining workspaces and monitoring the overall situation.
Posted Feb 27, 2025 - 08:30 UTC

Update

The root cause of the issue has been identified and fixed. The team is still working on cleaning up the remaining duplicate survey results that have been created erroneously.
Posted Feb 26, 2025 - 09:01 UTC

Identified

Customers may see empty survey runs shown as the current survey result, which show zero completion and no progress. These empty runs are duplicates and the actual survey results are still accessible through the survey history. No data was lost.

The team has identified the root cause of the problem and is working to address the duplicate survey runs.
Posted Feb 24, 2025 - 18:48 UTC
This incident affected: EU Instances (EAM), US Instances (EAM), CA Instances (EAM), AU Instances (EAM), DE Instances (EAM), CH Instances (EAM), AE Instances (EAM), UK Instances (EAM), BR Instances (EAM), SG Instances (EAM), and JP Instances (EAM).