On 2024-12-11 at 08:00 UTC, multiple customers in the EU region experienced incorrect KPI values. No customer data was affected. The service disruption lasted approximately 10 hours and 30 minutes until 2024-12-11 at 18:30 UTC.
Our team worked diligently to address the issue. We identified the affected customers and implemented a solution that corrected the KPI values. By 18:30, we had verified that all KPI values were accurate and the issue was entirely resolved.
The root cause of the problem was an overload of KPI calculation jobs that were not being processed due to limited resources and processing time. As the jobs grew, processing resources and processing time remained constant. We reached a point where calculations of KPI jobs for multiple customers were being dropped hence giving outdated KPI values.
We have identified the combination of factors that led to this oversight and will enforce the following practices to prevent similar issues in the future: