The Sitecore Managed Cloud Containers Disaster Recovery feature allows customers to maintain or quickly resume mission-critical functions following a disaster, thereby supporting the customer's business continuity plan. When a disaster occurs in a region containing the production environment (primary), the Disaster Recovery tool allows the environment to be recovered in another region (secondary) or a disaster recovery site.
Sitecore currently provides the following disaster recovery options:
This article provides information on the disaster recovery configurations, workflows, and architectural aspects to be aware of.
The following prerequisites are common for all the disaster recovery options:
Ensure that SQL Server Geo-Replication is NOT enabled in the primary environment.
When a disaster happens, Sitecore must receive an alert within 15 minutes. On the basis of the alert, Sitecore validates the authenticity of the alert and creates a support ticket to investigate the issue, and informs the customer about the initial investigation. If the issue turns out to be the result of any kind of disaster that means that the primary resource group cannot be recovered temporarily, Sitecore will start the failover process, after the approval from the customer, or without the approval from the customer, according to Service type. The customer can also raise a request for this through the Support Portal. The failover process provides the customer with a secondary environment with which they can continue business-critical activities until the primary environment has become available.
Notes:
GitOps and continuous deployment
Managed Cloud Containers support the use of GitOps. For more details refer to Deploying in Managed Cloud.
Our disaster recovery solution uses this in managing states in the secondary environment.
The infrastructure and application repositories are extended to support disaster recovery when setting up DR Basic Cold Standby.
Geo-Replication for Basic Cold Standby
Geo-Replication creates a continuous copy of the database in the secondary region. In the event of a disaster, we can simply fail over to the secondary region and bring up the databases with minimal downtime.
Sitecore has two Disaster Recovery features:
The DR Basic Cold Standby recovery option takes a longer recovery time. This is because the secondary Sitecore XP/XM environment is created as part of the failover process.
The DR Managed Hot Standby recovery option has the quickest recovery time compared to DR Basic Cold Standby disaster recovery configurations. This is because the secondary Sitecore XP/XM environment is already up and running, and during the failover, the endpoints will be switched at the Azure Front Door to bring the service back online.
The infrastructure and application pipelines also manage the provisioning to secondary to apply the changes that are made to the primary environment.
The key specifications are:
You can raise a support query for detailed information on specific Disaster Recovery features on the Sitecore Support Portal.
Disaster recovery introduces some new considerations when you build a Sitecore XP/XM solution. This section tries to address some of the most common ones.
Choosing your Azure Region
Azure organizes its data centers into regions with a latency-defined perimeter and is connected through a dedicated regional low-latency network. When choosing a region, we recommend choosing the paired region recommended by Microsoft, as mentioned in the article here, to ensure fast backups and consistent customer delivery speeds. Refer to the Region Support section to find compatible regions.
Third-party service APIs
If the Sitecore implementation uses any third-party service APIs that limit access based on IP, then it is essential to register the IPs of the secondary data center with the service. A failure to register the IPs might result in a delay to bring the secondary Sitecore environment online.
Outage page
Managed Cloud uses Azure Functions to serve an outage page in case of an outage. Using Azure Functions means the outage page returns a 503 code to indicate the service is unavailable. We recommend that the outage page only contains the necessary information to assure customers that the site will be back online soon, for example:
The outage page is simply constructed with the basic text by default. You can update and customize the outage page by changing the outage page content (index.html).
Outage page customization
The Managed Cloud Containers environment uses the Infrastructure as Code (IaC) declarative approach to define the desired state of the system. This is also applicable to the outage page. Below are the instructions to update the existing outage page.
Edit DR outage html
Run Infrastructure pipeline
This section describes limitations to the Disaster Recovery options provided by Managed Cloud.
No removal of Control Resource Group
The Control Resource Group contains all resources used to restore the Sitecore XP/XM environment successfully in a secondary data center. Deleting the Control Resource Group or its resources can lead to the inability to perform successful recovery.
xDB is excluded while considering the recovery time
The Recovery time needed while doing the failover process does not cover the xDB rebuild due to the significant amount of time it can take for a large content database. If the analytics indexes are not rebuilt this only affects the functionality that depends on lists and does not affect the frontend site.
Customizations
Since our Managed Cloud Containers solution is based on Infrastructure as Code (IaC), it is highly customizable. We are using Terraform & Ansible for infrastructure and application provisioning respectively.
These artifacts are available in customers' Git repository, and we use GitOps practices to manage customer environments. Because of such a customization support for any backup, restore, or replication in DR is subjective, contact us to get more details.
Failover situations not supported
There is a small set of situations where it might not be possible to restore a production site into the secondary data center. For example, when a global Azure service such as authentication or Azure Front Door is down.
Disaster recovery false alarm
Sometimes reboot node is necessary when there are security updates. Rebooting the node will bring the environment down for a short time span which will trigger the automatic DR failover and failback in the DR Managed Hot Standby scenario.
Disaster recovery testing is not supported
Sitecore does not support the testing of Disaster Recovery scenarios for customer implementations at this time.
Geography |
Azure Paired regions |
ElasticSearch support |
SearchStax support |
Limitations & Recommendation (if there is no support for either ElasticSearch or SearchStax in the secondary region, we can recommend a suitable region) | |
North America |
East US 2 |
Central US |
- |
✓ |
ElasticSearch Elastic High IO Configuration mismatch between regions. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). |
Europe |
North Europe (Ireland) |
West Europe (Netherlands) |
✓ |
✓ |
|
North America |
East US |
West US |
- |
✓ |
ElasticSearch does not support West US. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). |
North America |
North Central US |
South Central US |
- |
✓ |
ElasticSearch does not support North Central US. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). |
North America |
West US 2 |
West Central US |
- |
✓ |
ElasticSearch does not support West Central US. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). |
Brazil |
Brazil South One directional only (Brazil South to South Central US) |
South Central US |
- |
✓ |
ElasticSearch does not support Brazil South. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). |
Japan |
Japan East |
Japan West |
- |
✓ |
ElasticSearch does not support Japan West. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). There is a limitation for Managed Cloud Containers (MCC) packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Japan West. Optional regions can be recommended or upgrade to a newer MCC version. |
Asia |
East Asia |
Southeast Asia |
- |
✓ |
ElasticSearch does not support East Asia. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). |
UK |
UK West |
UK South |
- |
✓ |
ElasticSearch does not support UK West. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support UK South. Optional regions can be recommended or upgrade to a newer MCC version. |
Canada |
Canada Central |
Canada East |
- |
✓ |
ElasticSearch does not support both Canada Central and Canada East. Optional regions can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Canada East. Optional regions can be recommended to upgrade to a newer MCC version. |
Korea |
Korea Central |
Korea South |
- |
✓ |
ElasticSearch does not support both Korea Central and Korea South. Optional regions can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Korea South. Optional regions can be recommended to upgrade to a newer MCC version. |
Australia |
Australia East |
Australia Southeast |
- |
✓ |
ElasticSearch does not support both Australia Southeast. Optional regions can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id). There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Australia Southeast. Optional regions can be recommended to upgrade to a newer MCC version. |
United Arab Emirates |
UAE North |
UAE Central |
- |
- |
ElasticSearch does not support both UAE North and UAE Central. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support UAE Central. Contact Sitecore to discuss recommended regions. |
Australia |
Australia Central |
Australia Central 2 |
- |
- |
ElasticSearch does not support both Australia Central and Australia Central 2. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support Australia Central 2. Contact Sitecore to discuss recommended regions. |
France |
France Central |
France South |
- |
- |
ElasticSearch does not support France South. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support France South. Contact Sitecore to discuss recommended regions. |
Germany |
Germany West Central |
Germany North |
- |
- |
ElasticSearch does not support both Germany West Central and Germany North. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support Germany North. Contact Sitecore to discuss recommended regions. |
Norway |
Norway East |
Norway West |
- |
- |
ElasticSearch does not support both Norway East and Norway West. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support Norway West. Contact Sitecore to discuss recommended regions. |
Switzerland |
Switzerland North |
Switzerland West |
- |
- |
ElasticSearch does not support both Switzerland North and Switzerland West. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support Switzerland West. Contact Sitecore to discuss recommended regions. |
India |
Central India |
South India |
- |
- |
ElasticSearch does not support both Central India and South India. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support South India. Contact Sitecore to discuss recommended regions. |
India |
West India One direction only, West India to South India |
South India |
- |
- |
ElasticSearch does not support both West India and South India. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support both West India & South India. Contact Sitecore to discuss recommended regions. |
Brazil |
Brazil Southeast |
Brazil South |
- |
- |
ElasticSearch does not support both Brazil Southeast and Brazil South. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support Brazil Southeast. Contact Sitecore to discuss recommended regions. |
South Africa |
South Africa North |
South Africa West |
- |
- |
ElasticSearch does not support both Africa North and South Africa West. For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch. SearchStax does not support South Africa West. Contact Sitecore to discuss recommended regions. |
China |
China North |
China East |
- |
- |
Not Supported |
China |
China North 2 |
China East 2 |
- |
- |
Not Supported |
Action |
Description | |
---|---|---|
1 |
Customer Checks prerequisites, Considerations & limitations |
Check if the primary environment meets the prerequisites mentioned. Understand the areas in the consideration section that will be required when performing setup, failover, and failback. Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, prepare an action plan. |
3 |
Customer Requests DR Setup |
Create a "DR Basic ColdStanby Setup" Service request. |
4 |
CloudOps does prerequisite and limitation check |
CloudOps will verify the primary environment for prerequisites, Considerations & limitations before provisioning DR. |
5 |
CloudOps performs DR Setup |
CloudOps provisions DR. |
6 |
CloudOps notifies Customer of DR setup status |
CloudOps will communicate and provide updates before, during, and after DR Setup. |
8 |
Configures Outage App content |
As mentioned in the FAQ, configure the outage page according to the customer's specifications. As mentioned in the "Access" item in the Consideration section CloudOps will assist in providing temporary access to the relevant resources. |
|
Action |
Description |
---|---|---|
1 |
CloudOps receives primary region unavailability/disaster alert |
Our monitoring service in the control resource group will generate an alert for CloudOps to take action. |
2 |
CloudOps notifies Customers of the disaster |
|
3 |
Customer requests for failover when the customer considers a failover is required |
Create a "DR Basic Cold Standby Failover" Service request. |
4 |
CloudOps performs failover |
|
5 |
CloudOps notifies customers of the failover status |
|
6 |
Optionally, the customer applies custom configurations or provisioning in the secondary environment. |
Understand the areas in the consideration section that will be required after performing failover. Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, execute the action plan prepared prior to DR Setup. Review the additional support and how it will impact your customization. Draw your attention to the fact that customizations, which have been done on the primary environment and that are not listed in this document, are not supported. |
|
Action |
Description |
---|---|---|
1 |
CloudOps receives primary region availability/recovery alert |
Our monitoring service in the control resource group will generate an alert for CloudOps to take action. |
2 |
CloudOps notifies Customer of the recovery |
|
3 |
Customer requests for failback |
Create a "DR Basic ColdStanby failback" Service request. |
4 |
CloudOps performs a failback |
|
5 |
CloudOps notifies customers of the failback status |
|
|
Action |
Description |
---|---|---|
1 |
CloudOps receives primary region unavailability/disaster alert |
Our monitoring service in the control resource group will generate an alert for CloudOps to monitor the failover and take actions when required. |
2 |
Failover is executed automatically |
|
3 |
|
|
4 |
Optionally, a customer applies custom configurations or provisioning in the secondary environment. |
Understand the areas in the consideration section that will be required after performing failover. Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, execute the action plan prepared prior to DR Setup. Review additional support and how it will impact your customization. Take note that customizations done on the primary environment that is not listed in this document are not supported. |
|
Action |
Description |
---|---|---|
1 |
CloudOps receives primary region availability/recovery alert |
Our monitoring service in the control resource group will generate an alert for CloudOps to monitor the failover and take actions when required. |
2 |
Failback is executed automatically |
|
3 |
CloudOps notifies customers of the failback status |
|
How do customers request the Managed Cloud Disaster Recovery (DR) feature for Sitecore Managed Cloud environments?
The customer can ask to set up Disaster Recovery for their XP/XM environment through the Sitecore regional office or Sitecore sales team.
What actions do customers need to take after the DR setup has been done?
After the DR setup has been complete, customers are requested to perform the following actions:
The instructions for how to do so are provided by Sitecore engineers after the provision of the DR setup. Alternatively, the customer can raise a support query for detailed information on the Sitecore Support Portal.
What are the new resources that are introduced once the DR setup has been done?
Post provision of DR setup, the customer is able to see the following resource groups according to the chosen DR type:
Do customers have limited access rights to the DR resources?
Sitecore provides limited access to customers on the additional resource groups (Control and Secondary). This helps Sitecore to prevent any changes to the configurations related to backup policies and automation.
How is the paired region chosen for the DR setup?
Sitecore chooses the best-paired region for our customers that complies with Microsoft's standards. More detailed descriptions are provided in the Region Support section.
Can I update the default outage page?
Yes, you can update the outage app by changing the outage app content (index.html) in the infrastructure repository and running the infrastructure pipeline with checking the "Update Outage App" input. More detailed instructions are provided on the Outage page under the consideration section.
Will everything from my Primary resource group be available after the Failover?
No, we will restore only Standard Managed Cloud Containers resources. Review the Limitation section, especially the customization section for the current limitations of HADR.
Do the custom domains and SSL bindings replicate from the Primary environment during the failover?
Since we are configuring domains and SSL bindings in Azure Front Door and Azure Front Door is globally available, there is no need to replicate these configurations.
What is a management certificate?
It is the certificate that is used while provisioning Sitecore.
Purpose of a Management Certificate – The Sitecore Azure module is based on the Microsoft Azure Service Management REST API. All API operations are performed over SSL and are mutually authenticated using the X.509 v4 certificate. Therefore, a management certificate must be uploaded to the module.
What is the procedure of enabling DR for SolrCloud?
Sitecore follows the following procedures while enabling Disaster Recovery setup for Managed Cloud customers who have purchased Managed Cloud instances with SolrCloud, to provide DR availability for both.
How do customers view DR environment data in Grafana?
After a failover, customers can view DR/Secondary Sitecore related data in the DR Grafana. This is a condition by design, where primary Grafana and DR/secondary Grafana data are not stored together.