The following is the incident report for the Periscope Data by Sisense incident that occurred on June 5th, 2019. This issue resulted in many queries not loading in the Periscope Data application. We understand the impact this had on our customers and sincerely apologize. We have taken a number of steps to prevent this issue from occurring in the future.
From approximately 5:45 a.m. to 6:46 a.m. PT on June 5th, 2019, the Periscope Data chart data storage service and database was degraded. Many queries in Periscope Data did not load. The root cause of the issue was a lock on the chart data storage service and database, resulting in charts not loading within the application.
6:30 a.m.: On-call engineer is paged due to queries not completing and begins investigation
6:33 a.m.: Chart data storage service memory usage identified as high with high failure rates
6:35 a.m.: Restarted service pods
6:40 a.m.: Storage service errors continue post-restarts
6:46 a.m.: Customers were notified of incident via Status Page: "Periscope Data Charts Not Loading”
6:46 a.m.: Storage service updated and failed over to primary database
6:53 a.m.: Charts begin loading as expected.
7:04 a.m.: Status page updated to Investigating
8:07 a.m.: Status page updated to Monitoring to ensure full resolution
11:47 a.m.: Status page updated to Resolved
Our investigation showed that the issue resulted from a lock on the chart data storage service and database. Charts did not load normally for approximately 1 hour in the early hours (PT) of June 5th.
We are taking steps to ensure that health check tests are in place to issue timely alerts to our on-call team for quick mitigation of any potential issues in the future. In addition, we are adding more monitoring and alerts for dangerously high memory usage and high failure rates.