Some Periscope Data Queries Loading Slowly
Incident Report for Periscope
Postmortem

The following is the incident report for the Periscope Data incident that occurred on March 8th, 2019. This issue resulted in many of our customers not being able to run new queries and delays in background refreshes in the Periscope Data application. We understand the impact this had on our customers and sincerely apologize. We have taken a number of steps to prevent this issue from occurring in the future.

ISSUE SUMMARY

From approximately 7:00am to 4:30pm PT on March 8th, 2019, the Periscope Data query service was degraded. Many new queries to Periscope did not complete. The root cause of the issue was due to an overloaded database in our backend query service, resulting in query performance to degrade significantly.

TIME LINE

7:00 am: On-call engineer was paged due to slow rate of queries completion and began investigation.
7:15 am: Engineers restarted backend query service. Engineers deleted non-critical rows from database.
7:43 am: Customers were notified of incident via Status Page: "Periscope Data App Queries Not Running”
8:30 am: Engineers paused background query running jobs to reduce database load and allow foreground queries to run.
9:00 am: Engineers identified the problem query to the database that may have resulted in the slow down.
9:30 am: Engineers deleted rows from the problem table. Database CPU started looking better. Database I/O metrics were not yet healthy.
9:36 am: Query processing latency was below 1 second again. Most new queries were able to run.
10:00 am: Engineers continued to delete non-critical rows and ran vacuum manually on the problem database to improve internal database metrics.
10:47 am: Status Page was moved to Monitoring.
12:00 pm: Engineers performed more aggressive clean up and vacuuming of the problem database.
3:30 pm: Database auto vacuuming began to catch up. Manual clean-up stopped.
4:30 pm: All queued query requests cleared. Database CPU and IOPS metrics were back to healthy level. All queries were completing as normal, background queries were also resumed.

REMEDIATION

Our investigation showed that the underlying database CPU had been climbing due to increased query load over time. It reached a dangerous level in the early hours of March 8th. Additionally, there were dead rows that were building up over time and hit a threshold beyond which normal vacuum cleanup process could keep up. Both of these factors resulted in severely degraded performance in our backend query service. Query requests quickly built up and the service was unable to keep up with new queries.

We are taking steps to ensure proper clean up of non essential database rows to prevent this from happening in the future. In addition, we are adding more monitoring and alerts on our databases’ CPU and IO utilization, and modifying our on-call runbook for proper cleanup and vacuuming to ensure quick mitigation for any potential issues in the future.

Posted Mar 11, 2019 - 16:04 PDT

Resolved
Periscope Data queries and background refreshes are currently running as expected for all customers.

For any further inquiries, the Periscope Data Support Team can be reached at support@periscopedata.com.
Posted Mar 08, 2019 - 16:45 PST
Update
Periscope Data queries are running as expected for the majority of customers. A small subset of customers continue to experience longer query backlogs. Background refreshes for dashboards have resumed and are working through the backlog. Engineers are continuing to monitor the query backlogs.

The Periscope Data Support Team can be reached at support@periscopedata.com or via live chat.
Posted Mar 08, 2019 - 15:39 PST
Update
Periscope Data queries are running as expected for the majority of customers. A small subset of customers continue to experience longer query backlogs that will temporarily cause longer loading bar times. Background refreshes for dashboards are temporarily paused. Engineers are continuing to monitor the query backlogs.

The Periscope Data Support Team can be reached at support@periscopedata.com or via live chat.
Posted Mar 08, 2019 - 14:40 PST
Update
Periscope Data queries are running as expected for the majority of customers. A small subset of customers continue to experience longer query times. Background refreshes for dashboards are temporarily paused. Engineers are monitoring the query backlogs to ensure a rapid return to normal query runtimes.

The Periscope Data Support Team can be reached at support@periscopedata.com or via live chat.
Posted Mar 08, 2019 - 14:10 PST
Update
Periscope Data queries are running as expected for the majority of customers. A small subset of customers continue to experience longer query times. Background refreshes for dashboards are temporarily paused to help alleviate query load. Engineers are continuing to monitor the query backlogs for those customers.

The Periscope Data Support Team can be reached at support@periscopedata.com or via live chat.
Posted Mar 08, 2019 - 13:31 PST
Update
Periscope Data queries are running as expected for the majority of customers. A small subset of customers continue to experience longer query times and increased backlogs that will temporarily cause longer loading bar times. Engineers are continuing to monitor the query backlogs for those customers to ensure a rapid return to normal query runtimes.

The Periscope Data Support Team can be reached at support@periscopedata.com or over live chat.
Posted Mar 08, 2019 - 12:33 PST
Monitoring
Periscope Data queries are working normally for the majority of customers. A small subset of customers are experiencing longer query backlogs that will temporarily cause longer loading bar times. Engineers are monitoring the query backlogs for those customers to ensure a rapid return to normal query runtimes.

The Periscope Data Support Team can be reached at support@periscopedata.com or via live chat.
Posted Mar 08, 2019 - 11:48 PST
This incident affected: Periscope.