Incident Description
From March 24, 11:29 UTC, a change in the English translation model of Pathfinder introduced a new placeholder naming convention and modifications in the source language used for translations. These changes impacted:
- The OData API, where labels with specific placeholders were not resolved, and field and relation names disrupted customer integrations.
- Fact Sheet update notifications, where placeholders meant to render relation or field names were not properly resolved. While notifications were still sent, some contained unexpected values.
This led to issues for customers whose integrations relied on exact values. The issue was mitigated by implementing a transformation layer in our OData API to ensure compatibility with previous naming conventions and re-translating specific labels.
Incident Resolution
- A fix was deployed on March 25 to address placeholder issues.
- Affected customers were identified and contacted.
- The initial change was not reverted; instead, a solution was implemented to maintain compatibility without requiring customer action.
- On March 26, at 13:40 UTC, a fix was applied to the notification system at 08:47 UTC, followed by a fix for the OData API to ensure proper resolution of placeholders.
Root Cause Analysis
The translation model change introduced non-backward-compatible placeholders, which were not accounted for in our OData API. Additionally, customers were unaware of the changes affecting their integrations. There was no monitoring in place for translation model changes impacting downstream services.
Preventative Measures
To prevent similar incidents, the following improvements will be implemented:
- Testing & Monitoring: Ensure translation model changes are tested against all integrations and establish alerts for changes affecting OData and Notifications.
- Incident Management: Improve coordination between teams for faster response, treating functionality breaking changes as incidents with clear internal and external communication.
- Automation & Prevention: Automate testing for translation model updates and strengthen the review process for transformation related changes.
- Detection: Expand logging and alerting mechanisms to detect placeholder resolution issues earlier and proactively identify affected customers.