Slow response times for Edge API requests
Incident Report for Flagsmith
Postmortem

Timeline

At 12:15pm UTC, we were notified of increased response times on a number of our Edge API endpoints. Investigation showed nothing immediately obvious but we suspected that it could be caused by Sentry, our APM tool. We set about removing the Sentry initialisation from our code and deployed it as soon as we could.

At 12:48pm UTC, this change was deployed and we observed the response times decrease immediately.

At 12:52pm UTC our monitoring confirmed that the average response time had returned to normal.

Next Steps

  • Look into improvements to reduce / remove the impact of Sentry issues on our Edge API.

    • Decrease the shutdown timeout of the Sentry SDK.
    • Look at using Sentry relay to remove the impact on core Edge API services.

  • Add integration tests to simulate performance degradation / outages from all downstream services.
Posted Jul 10, 2023 - 13:42 UTC

Resolved
This incident has been resolved.
Posted Jul 10, 2023 - 12:58 UTC
Monitoring
The downstream service has been successfully removed. Response times have returned to normal. We are continuing to monitor the situation.
Posted Jul 10, 2023 - 12:50 UTC
Identified
We have identified an issue caused by a downstream service which is causing a knock on effect to our performance. We are currently deploying a change to remove the downstream service.
Posted Jul 10, 2023 - 12:44 UTC
Investigating
We are currently investigating this issue.
Posted Jul 10, 2023 - 12:29 UTC
This incident affected: Edge API.