Incident Summary
On 17 June during a routine system upgrade, we encountered unexpected issues that disrupted how messages were processed and delivered across some of our services.
Root Cause
- A Debian OS upgrade introduced compatibility issues affecting message handling
- Our notifications system slowed down due to an inefficient database query
- Some services had trouble maintaining stable connections to our messaging system (RabbitMQ)
- Automatic retries overwhelmed the system, eventually redirecting messages into a backup queue
Remediation
- Identified and fixed the database query causing the bottleneck
- Adjusted messaging configurations to stabilize service connections
- Increased system resources and improved retry logic to reduce congestion
- Cleared affected queues and ensured all valid messages were processed
Service restored same day, with impacted messages reissued between 18th and 20th June.
Thank you for your patience as we worked to address the issue. If you're still experiencing any issues with missing messages, please don't hesitate to contact our support team directly.