Migration causing temporary table lock
Incident Report for Flagsmith
Postmortem

Root Cause

At 13:15 UTC on 03 Feb 2022, we began deploying a routine release of the Flagsmith application to our production SaaS environment. This release included a database migration which added a new unique index to one of our tables which holds information about multivariate values for features. When the migration was run in our other environments we noticed no ill effects from the addition of the index, however, in production where we have substantially more data this index took longer to add than anticipated and required a full table lock during that period. 

Downtime

Our monitoring shows that the application was unresponsive for a period of just under 2 minutes while the migration was running. 

Long Term Remediation

To improve on this in the future, we are planning to upgrade our version of Django to allow us to easily add indexes concurrently. We will also be monitoring more carefully for future index additions and checking whether they will require a table lock. Finally, we will be looking at making our staging environment more representative of production in terms of data so that we can catch issues such as this in the future.

Posted Feb 03, 2022 - 13:59 UTC

Resolved
We've resolved an issue with a database migration that caused a temporary full table lock when modifying an index. Total outage was around 70 seconds. We're going to investigate the root cause and provide an update when it's ready.
Posted Feb 03, 2022 - 13:25 UTC
This incident affected: Core API and Admin Dashboard.