Platform Outage
Incident Report for Currencycloud
Postmortem

OVERVIEW

The platform experienced a significant outage on 3 September 2024, triggered by an update to some micro services. The incident began with intermittent errors on the static data service, which then escalated to a complete platform-wide outage lasting approximately 20 minutes.

CLIENT IMPACT

The root cause of the incident was an issue with a micro service upgrade. Due to an error the upgrade process was not fully completed before the previous version was deleted and all services had successfully migrated to the newer version.

Some services were still running with the old version, as they had not been restarted to pick up the new version. This meant these services could no longer communicate with each other as version had been deleted.

In summary, the root cause was a gap in the upgrade process, where the old version was removed before all services had fully migrated to the new version. This led to a cascading failure that impacted a wide range of services and caused a significant platform outage

REMEDIATION

A restart of all impacted services allowed clients to pick-up the new version and resolve the incident.

Process review conducted, improvements to the upgrade process are underway to move to a simpler process.

Posted Sep 09, 2024 - 15:59 UTC

Resolved
This incident has been resolved.
Posted Sep 03, 2024 - 20:19 UTC
Update
We are continuing to investigate this issue.
Posted Sep 03, 2024 - 20:18 UTC
Update
We are are investigating an escalation of the incident that has caused a major outage impacting multiple services, multiple teams are working to resolve the issue as soon as possible
Posted Sep 03, 2024 - 19:53 UTC
Investigating
Major platform outage
Posted Sep 03, 2024 - 18:33 UTC
This incident affected: API, Payments, Conversions, Paydirect.io / Direct, Notifications, and Balances.