On Tuesday, October 15th, 2024, from 10:45 to 14:54 UTC (4 hours and 9 minutes), users of the Helpline Case Management platform in the EU Production environment experienced issues with the Issue Manager and Disclosure Manager screens. While users were able to log in, these screens failed to load and remained in an indefinite loading state.
During the incident, affected users were unable to access or interact with the Issue Manager and Disclosure Manager screens, resulting in disruptions to their workflows and significant delays.
The incident was caused by the simultaneous execution of multiple OData reports on the secondary database, which triggered queries with parameter sniffing issues. Concurrently, a database backup process was running, further straining the system.
The OData reports were scheduled to run at the same time and lacked throttling mechanisms to limit concurrent requests. The parameter sniffing issues arose because the SQL Server version in use does not support multiple cache plans, a feature available in newer SQL Server versions.
The issue was resolved by clearing blocking queries, which alleviated the Disk I/O channel saturation on the secondary database. Following this action, the Issue Manager, Disclosure Manager, and OData queries resumed normal functionality.
By implementing workarounds to address parameter sniffing, such as using local variables in queries to ensure consistent execution plans.
By analyzing the execution patterns of scheduled OData reports and leveraging our internal monitoring tools to proactively identify performance issues.
Not yet. Implementing monitoring alerts would enable proactive notifications of similar incidents in the future.
Yes, parameter sniffing workarounds will be included in future releases, and additional monitoring measures will be implemented to mitigate similar risks.
Short-term
Long-term
Introduce throttling or staggering mechanisms for OData report execution to avoid simultaneous overloads.
Incorporate parameter sniffing solutions as a best practice in all future database queries and releases.
Implement proactive monitoring for the execution patterns of scheduled OData reports.