Multiple customers in the EU, US, DE, UK, CA, AU, and CH regions noticed duplicated survey runs due to a bug in the fact sheet scope change logic. This logic keeps all relevant fact sheets and subscribers attached to a dynamic survey run. The bug was deployed for several days, from Feb 7, 2025 07:38 UTC to Feb 18, 2025 08:18 UTC, before reverting to the previous version. There was no data loss, but email notifications about a changed survey scope were sent out as a result.
We reverted the changes that caused the issue. It was resolved at Feb 18, 2025 08:18 UTC.
Since duplicated database records were introduced, we started addressing them. The mitigation plan was rolled out in several steps from Feb 18, 2025 08:18 UTC to Mar 7, 2025 10:59 UTC.
There were two issues at play:
We switched from the original implementation of the fact sheet scope change to the new implementation to solve performance issues. Once we switched, we failed to notice the irregular behavior of one edge case in the business logic. As a result, duplicate fact sheets were added to the survey scope for numerous poll runs.
With the removal of the survey scope change bug, our service experienced a high load, which could potentially impact other services. We decided to quickly apply small adjustments to the same logic to improve performance. We did not consider the potential side effects of the change, and as a result, we had a duplicate fact sheet added to a survey run.
Four things we will take out of this: