As part of a new feature rollout, there was a large database migration that needed to take place. We knew that the migration would take some time, however, it should not have affected production traffic.
Unfortunately, despite our health check returning unhealthy until all migrations are complete, AWS ECS promoted the new version of the API application before the migrations were complete. This meant that the code that was running was expecting certain columns / data to be available in the database which weren’t there yet.
We are still investigating what caused ECS to promote the new version before the migrations were complete.