Sitecore Managed Cloud Standard (MCS) – PaaS 1.0 Monitoring Metrics


Overview

This article provides a detailed list of default metrics monitored by the Sitecore Cloud Operations team for all Managed Cloud customers. More metrics might be included in future releases.

For more information about the Managed Cloud Standard monitoring description, refer to the service aspects article.

Description

Azure Monitoring and Alerts are based on a tier system. The tiers range from Tier 1, (which has the highest frequency of alerts and cost per alert), through to Tier 3, (which is queried every 15 minutes for alerts). Tier 3 alert is triggered if the condition matches the alert threshold. Tier 3 also has the lowest cost per alert. Sitecore also offers free Tier 4 ping test alerts. They are metric alert and activity log alert. Currently, metric alert is free if the subscription only has equal to or less than 10 units, and the activity log alert is free.

Note: Further pricing details are available via your aligned Account Executive or Customer Success Manager.

Importance of Monitoring and Customer Technical Contacts in Sitecore Managed Cloud Standard

The Sitecore Managed Cloud team communicates with the designated Customer Technical Contact on all matters related to alerts, availability, and maintenance. By default, this is defined in the Sitecore Customer Order section of the Managed Cloud contract:

Customer Technical Contact Name: [name of contact]
Customer Technical Contact Email: [email of contact]

This contact list can be one or more recipients and is managed through the Sitecore account team. Customers are discouraged from fully delegating this Technical Contact responsibility to a partner, or third party, because it can limit visibility to important system-wide notifications.

Monitoring In Managed Cloud

Sitecore Managed Cloud comes with a Monitoring package starting from all new environments activated after May 2021. The full list of included monitored metrics is highlighted in the following tables.

Log Analytics workspace:

Rules Threshold Breaches Period Frequency
(Minutes)
Monitoring
Plan
Daily cap reached > 0 Last 5 mins 5 Basic / Advanced

 

Azure Web Apps:

Rules Threshold Breaches Period Frequency
(Minutes)
Monitoring
Plan
HTTP 5XX response* > 50 count > 0 Last 60 mins 15 Basic / Advanced
Platform Availability KeepAlive.aspx > 3 failed regions > 0 5 mins 5 Basic / Advanced
Connections Zero <= 0 connection > 0 Last 1 min 1 Advanced
Average page response time > 30 secs > 0 Last 30 mins 5 Advanced
CD and CM backup issue > 0 Advanced
Health Check health (for 9.3.0 and later releases) > 30 secs > 0 Last 30 mins 5 Advanced
Expiration date of SSL/TLS certificates bound to the webapp application < 7 days before
expiration date
Basic / Advanced

* Only fire once per condition being met in 8 hours.

 

Azure SQL Database:

Rules Threshold Breaches Period Frequency
(Minutes)
Monitoring
Plan
DTU average
(one alert per DB)
> 95% > 30 60 mins 5 Basic / Advanced
Storage utilization
(one alert per DB)
> 75% > 30 60 mins 15 Basic / Advanced
Concurrent Workers (requests)
(one alert per DB)
> 95% > 30 60 mins 15 Basic / Advanced
Number of failed database connections
(one alert per DB)
> 5 count > 14 60 mins 10 Advanced

 

App Service Plan:

Rules Threshold Breaches Period Frequency
(Minutes)
Monitoring
Plan
CPU average > 95% > 30 60 mins 10 Basic / Advanced
Memory average > 95% > 30 60 mins 10 Basic / Advanced
Combined App Storage > 80% > 0 1 day 1440 Advanced

 

Azure Search Service (Microsoft.Search/searchServices):

Rules Threshold Breaches Period Frequency
(Minutes)
Monitoring
Plan
Throttled search queries > 30% > 30 60 mins 5 Basic / Advanced

 

Azure Redis Cache:

Rules Threshold Breaches Period Frequency
(Minutes)
Monitoring
Plan
The server load
(one alert per Redis Cache)
> 95% > 30 60 mins 15 Basic / Advanced
The average number of connected
clients
(one alert per Redis Cache)
> 80% > 0 30 mins 10 Advanced
The average CPU Percent Processor Time
(one alert per Redis Cache)
>= 95% > 0 30 mins 15 Advanced
The average used memory
(one alert per Redis Cache)
> 70% > 30 60 mins 10 Advanced

 

Azure Application Gateway:

Rules Threshold Breaches Period Frequency
(Minutes)
Monitoring
Plan
Expiration date of SSL/TLS certificates
bound to the listener (HTTPS)
< 7 days before
expiration date
Basic / Advanced

 

SearchStax (SOLR) Server:

Alert Threshold
   CPU Usage  > 90%
   JVM Heap Memory  > 90%
   Disk space  > 90%
 Search metrics:
   Average time/request   > 1 min
   Timeouts  >10
   Errors  > 40/hour
 Indexing metrics:
   Average time/request   > 1 min
   Timeouts  > 10
   Errors  > 40/hour

 

Important Note: The monitoring package does not yet support the following deployment types: Sitecore Single topologies: XP0, XM0, XDB0.

Monitoring rules are set up for all Sitecore Managed Cloud Standard environments. For production environments, the Sitecore Managed Cloud team actively triages each alert and escalates incidents as appropriate. For non-production environments, customers can choose to respond as they prefer, but the Sitecore Managed Cloud team does not actively triage.