Broadcast, Dashboard, Call Widget, Agent Tool Desktop, Agent Tool Mobile App, Liveshopping Mobile App, Liveshopping Player, REST API
Global, Frontend US, Frontend EU, Backend US, Backend EU
December 4, 2024 2:32PM UTC
[Resolved] On December 3, 2024, starting at approximately 13:42 UTC, a configuration issue in our backend APIs caused temporary disruptions to services like the Agent Tool and the ability to start pre-recorded broadcasts. We promptly identified the root cause and swiftly restored full functionality across all impacted regions.
Root Cause:
The incident was traced to a misconfiguration of the API's egress settings. An update earlier in the day changed the egress configuration, which advertently blocked outgoing HTTP requests to critical services, disrupting functionality across several endpoints.
Key Timestamps:
- 13:42 UTC: First recorded timeout in API logs.
- 15:28 UTC: Egress settings were corrected in the staging environment, and tests confirmed restored functionality.
- 15:37 UTC: Changes deployed to production resolved the issue for all affected services.
- 15:41 UTC: Agent Tool and other impacted services confirmed operational.
Future Prevention Measures:
- Egress Configuration Monitoring: Implement monitoring for egress configuration changes to detect and alert on potentially detrimental modifications.
- Automated Testing: Expand automated tests in staging to include checks for egress functionality before deploying to production.
- Ping Monitoring Enhancements: Review and enhance existing production monitoring to include outgoing traffic verification.
The incident has been resolved, and all services are currently operating normally. Further investigation will be conducted to ensure robust prevention measures are in place.