If a burst of traffic reaches the application while no free threads are available, the Timeout Exception is thrown as a result of the Redis driver design. The Redis driver blocks the request thread until a response from the Redis Server has been received and data is fully parsed by the callback. A lack of free threads to invoke the callback for parsing the received data in a timely manner (one second by default) leads to a timeout exception.
Technical background
A thread pool is allowed to create new worker threads to process incoming load under certain conditions. Adding more threads is beneficial only if free CPU resources are available. A thread pool injects new threads when the CPU usage is below 80%.
Because the CPU performance counter shows the system state for the previous second, the load produced by the newly-created threads is reflected only in a second. This results in a creation constraint of no more than 2 threads per second to prevent overloading the CPU.
Note: The CLR thread pool size management is an implementation detail that is subject to change at any time by the technology vendor.
The current implementation is described in Redis FAQ: Important details about ThreadPool growth.
Scenario
If the thread pool has fewer free threads than the number of incoming requests, all of the threads are taken by ASP.NET for incoming request processing, and a few more are created. The remaining ones are in the work queue.
No free worker threads are left to parse the Redis response because all are blocked while waiting for the parsing results.
Due to the lack of logic to acknowledge that response parsing has a higher priority, a priority inversion takes place:
- Sharing the common-purpose CLR thread pool leads to the possibility of pool clogging with other work items (incoming ASP.NET request processing).
- The newly-created thread pool threads are not guaranteed to pick the callback and can pick the incoming ASP.NET request instead.
- The ASP.NET request blocks the thread until the session state response has been parsed.
- The response is not parsed due to the lack of free threads.
The circular wait deadlock condition is resolved when the ASP.NET thread is unblocked by a timeout and throws an exception, leading to a thread being released.
The released thread might be assigned for pending callback processing depending on the current work queue.
Further reading
To resolve the issue:
- Make sure that there is no CPU saturation or CPU spikes on the server where Sitecore is hosted.
- Tune Redis provider settings as described in the "Tuning Redis configuration settings" section.
To resolve the issue:
- Allow the thread pool to quickly scale up to a certain minimum number of threads by increasing the minimum number of worker threads in the thread pool.
The source code of the patch provided in the "Solution (XP 8.0.0 - 9.0.2)" section can be used as an example how to increase the minimum number of worker threads in the thread pool.
- Download and install the hotfix compatible with the affected product version:
Be aware that the hotfix was built for a specific Sitecore XP version, and must not be installed on other Sitecore XP versions or in combination with other hotfixes. In case any other hotfixes have already been installed on a certain Sitecore XP instance, send a request for a compatibility check to Sitecore Support. Note that the ZIP file contents must be extracted to locate installation instructions and related files inside. The hotfixes must be installed on a CM instance and then synced with other instances using standard development practices.
- Make sure that there is no CPU saturation or CPU spikes on the server where Sitecore is hosted.
- Tune Redis provider settings as described in the "Tuning Redis configuration settings" section.
To resolve the issue:
- Configure an application to secure free worker threads and avoid CPU overload. The following options can help to achieve this:
The following patch provides an example how to configure an application in the way explained earlier. To apply the patch:
- Put the Sitecore.Support.210408 support patch assembly (Sitecore.Support.210408.dll) into the \bin folder.
- Put the Sitecore.Support.210408.config file into the \App_Config\Include\zzz folder.
The source code of the patch: ConfigureThreadPool.cs.
Note: The configuration values in the patch are given only as a starting point. The final values must be tuned per-solution as a result of load testing.
- Make sure that there is no CPU saturation or CPU spikes on the server where Sitecore is hosted.
- Tune Redis provider settings as described in the "Tuning Redis configuration settings" section.
To overcome possible Redis timeout issues, tune the Redis provider settings:
- operationTimeoutInMilliseconds to tolerate application CPU saturation.
- connectionTimeoutInMilliseconds and retryTimeoutInMilliseconds to tolerate network failures.
- pollingMaxExpiredSessionsPerSecond to handle the session expiration throttling and spikes of the load.
The following values can be added as a starting point in Redis provider configuration section in the web.config file:
- operationTimeoutInMilliseconds="5000"
- retryTimeoutInMilliseconds="16000"
- connectionTimeoutInMilliseconds ="3000"
- pollingMaxExpiredSessionsPerSecond="20"
The final values must be tuned per-solution as a result of a load testing.
For more information about the Redis provider settings, see Redis provider settings reference.