Queries Not Running
Incident Report for Sisense for Cloud Data Teams
Postmortem

The following is the incident report for the Sisense incident that occurred on October 13th, 2020 between 12:40 PM PDT and 8:16 PM PDT. This issue affected all customers using the Sisense for Cloud Data Teams web application to varying degrees. For brief periods of time between 12:40 - 12:48 PM PDT and 4:23 PM - 5:01PM PDT, customers may have lost access to the application. More commonly, customers were not able to run any new queries during the outage period.  We understand the effect this had on our customers and sincerely apologize. We have taken a number of steps to prevent this issue from occurring in the future as detailed further below.

ISSUE SUMMARY

Our investigation found that a database used in running customer queries was unable to accept new writes, effectively becoming read-only.   Further investigation revealed that the database load was due to high-contention of the database’s access to shared memory. Resetting and failing over the database cleared the contention but it quickly reappeared, due to bloated indices that were consuming excessive shared memory and causing the database contention. 

Once the database became effectively read-only, customers weren’t able to run new queries because the new query requests couldn’t be written to the database.

To restore application availability, the database was restarted, the affected tables were re-indexed and all associated services were restored. This restart & reindex cycle took place three times during the incident, as multiple tables were affected and could not all be resolved in a single pass. Once complete, all customers were able to access Sisense for Cloud Data Teams and run queries, as normal.

REMEDIATION

We are confident that we have identified appropriate corrections and are equipped to handle any similar outages in the future. The team is committed to creating the most reliable data platform possible. As a response to this issue, we’re making key improvements to our infrastructure:

  • Tuning our autovacuum settings to clear dead rows more frequently 
  • Adding monitoring around index bloat, which will alert on-call engineers before similar situations can occur in the future.

If you have any questions, please reach out to our Solutions Team at supportdt@sisense.com or via live chat.

Posted Oct 15, 2020 - 12:15 PDT

Resolved
Queries are running for all users. The Sisense Cloud Support Team can be reached at supportdt@sisense.com.
Posted Oct 13, 2020 - 21:48 PDT
Monitoring
Queries in Sisense are running. Engineers are monitoring the applied a fix and are working on further preventative actions. The Sisense Cloud Support Team can be reached at supportdt@sisense.com.
Posted Oct 13, 2020 - 21:00 PDT
Update
Queries in Sisense for Cloud Data Teams are running. Engineers are continuing to investigate and monitor the issue. The Sisense Cloud Support Team can be reached at supportdt@sisense.com.
Posted Oct 13, 2020 - 20:37 PDT
Update
Engineers are continuing to investigate the issue. The Sisense Cloud Support Team can be reached at supportdt@sisense.com.
Posted Oct 13, 2020 - 19:55 PDT
Investigating
Queries in the Sisense for Cloud Data Teams application are not running for some users. Engineers are actively investigating the issue. The Sisense Cloud Support Team can be reached at supportdt@sisense.com.
Posted Oct 13, 2020 - 19:25 PDT
This incident affected: Sisense for Cloud Data Teams.