Identity integrations are not being triggered

Incident Report for Flagsmith

Postmortem

Summary

On September 5th at 09:45 UTC, we initiated a release that included a database migration aimed at introducing a new constraint to the table containing information related to flags. According to our pre-live tests, this task should not have taken more than 50 milliseconds. Unfortunately, during the release to production, due to the high throughput on a particular table that it needed to acquire a temporary lock on, this caused a backlog of blocked connections waiting on the migration to complete. This caused a knock on effect that exhausted the connections on the database and a full restart was necessary.

Once the restart was complete, the connections were restored and service was resumed. This happened at 10:20 UTC.

Next Steps

We have researched the cause of the issue and we do still have further research to understand certain aspects. Our current plan in the meantime is to implement certain safeguards as can be found in the following links to the Postgres documentation which should help reduce any impact in the future.

https://www.postgresql.org/docs/11/runtime-config-client.html

https://www.postgresql.org/docs/11/runtime-config-logging.html (log_lock_waits)

Posted Sep 13, 2023 - 08:43 UTC

Resolved

This incident has been resolved.

Posted Sep 12, 2023 - 12:15 UTC

Update

We are continuing to monitor for any further issues.

Posted Sep 12, 2023 - 11:35 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Sep 12, 2023 - 11:26 UTC

Investigating

We are currently investigating this issue.

Posted Sep 12, 2023 - 11:14 UTC

This incident affected: Edge API.