Sitecore Managed Cloud – PaaS 2.0. Monitoring Metrics


Overview

This article provides a detailed list of default metrics monitored by the Sitecore Cloud Operations Support for all Managed Cloud customers. More metrics might be included in future releases.
For more information about the Managed Cloud Standard monitoring description, refer to the service aspects article.

Importance Of Monitoring And Customer Technical Contacts In Sitecore Managed Cloud Standard

The Sitecore Managed Cloud Support communicates to the designated Customer Technical Contact on all matters related to alerts, availability, and maintenance. By default, this is defined in the Sitecore Customer Order section of the Managed Cloud contract:

Customer Technical Contact Name: {name of contact}
Customer Technical Contact Email: {email of contact}

This contact list can be one or more recipients and is managed through the Sitecore account support. Customers are discouraged from fully delegating this Technical Contact responsibility to a partner or third party because it can limit visibility to important system-wide notifications. 

Monitoring In Managed Cloud

Sitecore Managed Cloud comes with a Monitoring package starting from all new environments activated after May 2021. The full list of included monitored metrics is highlighted in the following tables.


Application Insight:

Rules Threshold Breaches Period Frequency (Minutes) Monitoring Plan
Daily cap Reached - > 0 - - Basic / Advanced

 

Azure Web Apps:

 

Rules

 Threshold   Breaches Period Frequency 
(Minutes)
Monitoring 
Plan
 HTTP 5XX response*  > 10 counts > 0  Last 60 mins 15  Basic / Advanced 
 Platform Availability KeepAlive.aspx  > 3 failed regions > 0  5 mins 5  Basic / Advanced 
 Average page response time   > 30 secs > 0  Last 30 mins 5 Advanced
 CD and CM backup issue - > 0 - - Advanced
Health Check health (for 9.3.0 and higher releases)   > 30 secs > 0  Last 30 mins 5 Advanced
Expiration date of SSL/TLS certificates bound to the webapp application < 7 days before expiration date - - -   Basic / Advanced 

*Only trigger once per condition met in an 8 hour period.

 

App Service Plan:

Rules

 Threshold   Breaches Period Frequency 
(Minutes)
Monitoring 
Plan
 CPU average  > 95% > 30  Last 60 mins 10  Basic / Advanced 
 Memory average   > 95% > 30  Last 60 mins 10  Basic / Advanced 
 File storage usage  > 80% > 0  Last 1 day 1440 Advanced

 

Azure SQL Database:

Rules

 Threshold   Breaches Period Frequency 
(Minutes)
Monitoring 
Plan
 DTU average   > 95%   > 30  Last 60 mins  5  Basic / Advanced 
 CPU average   > 95%   > 30  Last 60 mins  10  Basic / Advanced 
 Storage utilization   > 75%   > 30  Last 60 mins  15  Basic / Advanced 
 DATA IO average   > 95%   > 30  Last 60 mins  15  Basic / Advanced 
 LOG IO average   > 95%   > 30  Last 60 mins  15  Basic / Advanced 
 Concurrent Workers (requests)   > 95%   > 30  Last 60 mins  15  Basic / Advanced 
 Concurrent sessions supported by the DB tier   > 95%   > 30  Last 60 mins  10 Advanced 
 Number of the failed database connections   > 5 counts  > 14  Last 60 mins  10 Advanced 
 Average In-Memory OLTP storage   > 95%   > 14  Last 60 mins  15 Advanced 

 

Azure Elastic Pools (PaaS 2.0 Only): 

Rules

 Threshold   Breaches Period Frequency 
(Minutes)
Monitoring 
Plan
Storage percentage  >75%  > 1 error  Last 30 Minutes  1  Basic/ Advanced 
CPU  >90%  > 3  Last 15 Minutes  1  Basic/ Advanced 
DATA IO  >90%  > 3  Last 15 Minutes  1  Advanced 
Log IO  >90%  > 2  Last 15 Minutes  1  Advanced 
Worker %  >95%  > 3  Last 15 Minutes  1  Advanced 
Concurrent sessions supported by the DB tier  >90%  > 3  Last 15 Minutes  1  Advanced 
Average In-Memory OLTP storage  >95%  > 1 error  Last 5 Minutes  1  Advanced 

 

Azure Redis Cache:

Rules

 Threshold   Breaches Period Frequency 
(Minutes)
Monitoring 
Plan
 Server load   > 95%   >30  Last 60 mins  15 Basic / Advanced 
 Average number of clients connected   > 80%    >0  Last 30 mins  10 Advanced 
 Average CPU Percent Processor Time   >= 95%   >0  Last 30 mins  15 Advanced 
 Average Used Memory   > 70%    >30  Last 60 mins  10 Advanced 

 

Azure Application Gateway:

Rules

 Threshold   Breaches Period Frequency 
(Minutes)
Monitoring 
Plan
 Expiration date of SSL/TLS certificates bound to the listener
 (HTTPS) 
< 7 days before expiration date  - - - Basic / Advanced 

 

SearchStax (SOLR) Server:

Alert  Threshold 
CPU Usage   > 80% 
JVM Heap Memory   > 80% 
Disk space   > 80% 
    Search metrics:  
Average time/request    > 1 min 
Timeouts   >10 
Errors   >10 
    Indexing metrics:  
Average time/request    > 1 min 
Timeouts   > 10 
Errors   > 10 

IMPORTANT NOTE: The monitoring package does not yet support the following deployment types: Sitecore single topologies: xP0, xM0, xDB0.

Monitoring rules are set up for all Sitecore Managed Cloud Standard environments. For production environments, the Sitecore Managed Cloud Support actively triages each alert and escalates incidents as appropriate. For non-production environments, customers can choose to respond as they prefer, but the Sitecore Managed Cloud Support does not actively triage. If a customer wants to receive non-production alerts for Solr, hosted via SearchStax, this can be requested in a Support Case  - otherwise, there will be no alerting for non-production Solr components.