Core API: Increased error rates

Incident Report for Flagsmith

Postmortem

As part of a new feature rollout, there was a large database migration that needed to take place. We knew that the migration would take some time, however, it should not have affected production traffic.

Unfortunately, despite our health check returning unhealthy until all migrations are complete, AWS ECS promoted the new version of the API application before the migrations were complete. This meant that the code that was running was expecting certain columns / data to be available in the database which weren’t there yet.

We are still investigating what caused ECS to promote the new version before the migrations were complete.

Posted 3 years ago. Jul 01, 2022 - 11:45 UTC

Resolved

This incident has been resolved.
Posted 3 years ago. Jul 01, 2022 - 11:40 UTC

Investigating

We are seeing increased 502 responses to our Core API. We are aware of the cause and working on a fix.

The Edge API is unaffected.
Posted 3 years ago. Jul 01, 2022 - 11:31 UTC
This incident affected: Core API.