Repeated emails being sent for slips and broadcasts
Incident Report for EdSmart Platform Status
Resolved
After implementing some changes on Saturday and monitoring over the last 2 days, we are no longer seeing repeated emails being sent for the same slip or broadcast.

Specific steps taken include:

* Re-organising the sending of Slips module with better multi-threading to avoid potentially the module being overwhelmed when under extreme load in an individual thread. Basically better spreading the load across threads better so more work can be done simultaneously.
* Adding better fail safes to the send module to make sure the Slip has not been sent already.
* Adding new monitoring so that even if all changes have not been effective we are alerted if something seems out of the ordinary.

Since those changes were implemented late on Saturday we've processed a great deal of Slips including ones we'd held back from last week. We have not seem a recurrence of the issue.
Posted 4 months ago. Feb 04, 2019 - 16:24 AEDT
Update
Amendments to messaging modules and routines designed to mitigate the root cause have been deployed, We will continue to monitor.
Posted 4 months ago. Feb 03, 2019 - 02:23 AEDT
Update
Slips and Broadcasts have resumed sending, after identifying and removing duplicates. We are continuing work on resolving the root cause.

We have also implemented alerts to notify us of duplicates, and will make best endeavours to remove any that occur while we are in the process of fixing the issue.
Posted 4 months ago. Feb 02, 2019 - 21:57 AEDT
Update
We are once again seeing repeated emails being sent for slips and broadcasts, and have made the call to pause them from sending while we do further investigation into the root cause. We will provide an update once we have resumed sending.
Posted 4 months ago. Feb 01, 2019 - 17:07 AEDT
Monitoring
Summary and update on this issue -

Over the past 24 hours we’ve experienced a fault in the platform which resulted in a limited number of forms resulting in duplicate email notifications to parents. The issue appears to emanate from the module which processes scheduled Slips/Broadcasts and generates the resulting email notifications.

The issue only appears when the platform is under extreme load, it is not a fault which manifests every time, making tracing and diagnosis difficult.

The issue existed only for a short time, but did result in some recipients seeing multiple copies of notifications for the same Slip/Broadcast. When the issue first became apparent we, for a time, paused outbound notifications to allow analysis of the data.

Over the past few hours we have worked through queued messages to, where possible, identify and remove duplicates.

Remediation actions have included changes to our database scaling configuration; a minor adjustment to the scheduled process; and close examination of the module in question.

At this time the platform is operating normally and all outbound messages are being delivered.

The engineering team is continuing to monitor the platform’s performance and review the code.
Posted 4 months ago. Feb 01, 2019 - 10:16 AEDT
Identified
Email queues are clearing, backlogs should be out in next few minutes. We continue to work on the root cause, but have several areas within one particular module being closely reviewed.
Posted 4 months ago. Feb 01, 2019 - 02:54 AEDT
Update
Engineering has temporarily suspended emails for slips and broadcasts from being sent, while the root cause for duplicate emails is being investigated. Once a fix has been applied, we will resume sending emails.
Posted 4 months ago. Jan 31, 2019 - 13:04 AEDT
Update
Engineering currently aims to put a stop to the repeated emails for the affected slips. We will advise when that has been done.
Posted 4 months ago. Jan 31, 2019 - 12:32 AEDT
Investigating
We have received reports of parents receiving repeated emails for slips and broadcasts. Engineering is investigating and working to resolve the issue. We will provide updates here for this issue.
Posted 4 months ago. Jan 31, 2019 - 11:52 AEDT
This incident affected: Application.