Sitecore Managed Cloud Standard — Containers Disaster Recovery


Description

The Sitecore Managed Cloud Containers Disaster Recovery feature allows customers to maintain or quickly resume mission-critical functions following a disaster, thereby supporting the customer's business continuity plan. When a disaster occurs in a region containing the production environment (primary), the Disaster Recovery tool allows the environment to be recovered in another region (secondary) or a disaster recovery site.
Sitecore currently provides the following disaster recovery options:

This article provides information on the disaster recovery configurations, workflows, and architectural aspects to be aware of.

 

Prerequisites

The following prerequisites are common for all the disaster recovery options:

  1. The customer’s Sitecore Managed Cloud Containers deployment must be compliant. Refer to Deploying in Managed Cloud for more details.
  2. The customer's Sitecore Managed Cloud Containers solution must be compliant with Region support requirements described in the Region Support section.
  3. The customer is eligible to request the Disaster Recovery feature only if it is purchased within the Managed Cloud contract.
  4. The customer must have a valid Sitecore license file, Sitecore certificate, and a password when requesting disaster recovery setup from https://support.sitecore.net/
  5. Sitecore Containers solution running on Azure with:
    • Supported Versions:
      • 10.1.0:
        • r.0.1.187656
        • r.0.1.191601
        • r.0.1.199374
        • r.0.1.201392
        • r.0.1.204154
        • r.0.1.211803
        • r.0.1.217249
        • r.0.1.227714
        • r.0.1.228902
        • r.0.1.244207
        • r.0.1.258953
        • r.0.1.261853
        • r.0.1.263672
        • r.0.1.276989
        • r.0.1.278801
        • r.0.1.283040
        • r.0.1.287527
        • r.0.1.296377
        • r.0.1.309907
        • r.0.1.321601
        • r.0.1.329447
        • r.0.1.352169
        • r.0.1.383975
        • r.0.1.394606
        • r.0.1.426940
      • 10.1.1:
        • r.0.1.187668
        • r.0.1.190817
        • r.0.1.199375
        • r.0.1.201503
        • r.0.1.204171
        • r.0.1.211930
        • r.0.1.217250
        • r.0.1.227715
        • r.0.1.228903
        • r.0.1.244208
        • r.0.1.258958
        • r.0.1.261854
        • r.0.1.263671
        • r.0.1.276990
        • r.0.1.278804
        • r.0.1.283043
        • r.0.1.287525
        • r.0.1.296378
        • r.0.1.309915
        • r.0.1.321602
        • r.0.1.329448
        • r.0.1.352170
        • r.0.1.383976
        • r.0.1.394607
        • r.0.1.426941
      • 10.1.2:
        • r.0.1.191832
        • r.0.1.199376
        • r.0.1.201468
        • r.0.1.204209
        • r.0.1.211931
        • r.0.1.217251
        • r.0.1.227716
        • r.0.1.228904
        • r.0.1.244213
        • r.0.1.258959
        • r.0.1.261855
        • r.0.1.263669
        • r.0.1.276991
        • r.0.1.278805
        • r.0.1.283049
        • r.0.1.287524
        • r.0.1.296379
        • r.0.1.309909
        • r.0.1.321603
        • r.0.1.329449
        • r.0.1.352171
        • r.0.1.383977
        • r.0.1.394608
        • r.0.1.426953
      • 10.1.3:
        • r.0.1.352172
        • r.0.1.383978
        • r.0.1.394609
        • r.0.1.426954
      • 10.2.0:
        • r.0.1.195550
        • r.0.1.199377
        • r.0.1.201465
        • r.0.1.204153
        • r.0.1.211957
        • r.0.1.217252
        • r.0.1.227717
        • r.0.1.228905
        • r.0.1.244214
        • r.0.1.258960
        • r.0.1.261856
        • r.0.1.263668
        • r.0.1.276993
        • r.0.1.278806
        • r.0.1.283053
        • r.0.1.287520
        • r.0.1.296380
        • r.0.1.309919
        • r.0.1.321604
        • r.0.1.329450
        • r.0.1.352174
        • r.0.1.383980
        • r.0.1.394610
        • r.0.1.426961
      • 10.2.1:
        • r.0.1.352175
        • r.0.1.383981
        • r.0.1.394611
        • r.0.1.426962
      • 10.2.2:
        • r.0.1.426963
      • 10.3.0:
        • r.0.1.296381
        • r.0.1.309911
        • r.0.1.321605
        • r.0.1.329451
        • r.0.1.352176
        • r.0.1.383982
        • r.0.1.394612
        • r.0.1.426977
      • 10.3.1:
        • r.0.1.329452
        • r.0.1.352177
        • r.0.1.383983
        • r.0.1.394613
        • r.0.1.426978
      • 10.4.0:
        • r.0.1.394614
        • r.0.1.427029
    • Supported topologies: XM, XP.

 

Technical prerequisite

Ensure that SQL Server Geo-Replication is NOT enabled in the primary environment.

 

Overview

When a disaster happens, Sitecore must receive an alert within 15 minutes. On the basis of the alert, Sitecore validates the authenticity of the alert and creates a support ticket to investigate the issue, and informs the customer about the initial investigation. If the issue turns out to be the result of any kind of disaster that means that the primary resource group cannot be recovered temporarily, Sitecore will start the failover process, after the approval from the customer, or without the approval from the customer, according to Service type. The customer can also raise a request for this through the Support Portal. The failover process provides the customer with a secondary environment with which they can continue business-critical activities until the primary environment has become available.

Notes:

 

Disaster Recovery Features

GitOps and continuous deployment

Managed Cloud Containers support the use of GitOps. For more details refer to Deploying in Managed Cloud.

Our disaster recovery solution uses this in managing states in the secondary environment.

The infrastructure and application repositories are extended to support disaster recovery when setting up DR Basic Cold Standby.

Geo-Replication for Basic Cold Standby

Geo-Replication creates a continuous copy of the database in the secondary region. In the event of a disaster, we can simply fail over to the secondary region and bring up the databases with minimal downtime.

Sitecore has two Disaster Recovery features:

You can raise a support query for detailed information on specific Disaster Recovery features on the Sitecore Support Portal.

 

Considerations

Disaster recovery introduces some new considerations when you build a Sitecore XP/XM solution. This section tries to address some of the most common ones.

Choosing your Azure Region

Azure organizes its data centers into regions with a latency-defined perimeter and is connected through a dedicated regional low-latency network. When choosing a region, we recommend choosing the paired region recommended by Microsoft, as mentioned in the article here, to ensure fast backups and consistent customer delivery speeds. Refer to the Region Support section to find compatible regions.

Third-party service APIs

If the Sitecore implementation uses any third-party service APIs that limit access based on IP, then it is essential to register the IPs of the secondary data center with the service. A failure to register the IPs might result in a delay to bring the secondary Sitecore environment online.

Outage page

Managed Cloud uses Azure Functions to serve an outage page in case of an outage. Using Azure Functions means the outage page returns a 503 code to indicate the service is unavailable. We recommend that the outage page only contains the necessary information to assure customers that the site will be back online soon, for example:

The outage page is simply constructed with the basic text by default. You can update and customize the outage page by changing the outage page content (index.html).

Outage page customization

The Managed Cloud Containers environment uses the Infrastructure as Code (IaC) declarative approach to define the desired state of the system. This is also applicable to the outage page. Below are the instructions to update the existing outage page.

Edit DR outage html

  1. Go to Azure DevOps Organization.
  2. Go to Repos > Files from the sidebar.
  3. Select infrastructure repository from the repositories filter


  4. In infrastructure repository, navigate to hadr > outage >index.html


  5. To make changes to the HTML file, switch to Editing view, where you can add and delete content. Or clone the repository to the local and use any code editing tool to make the modification and push the commit to the infrastructure repository. Ensure that you follow your pull request process so that the pipeline runs with the intended changes from the relevant repo’s branch for your environment.

Run Infrastructure pipeline

  1. In Azure DevOps sidebar, click Pipelines > Pipelines
  2. On the Pipelines page, at the far right of the infrastructure pipeline, click the ellipsis  icon and choose  Run pipeline.


  3. Check the Update outage app checkbox and click Run.

Limitations

This section describes limitations to the Disaster Recovery options provided by Managed Cloud.

No removal of Control Resource Group

The Control Resource Group contains all resources used to restore the Sitecore XP/XM environment successfully in a secondary data center. Deleting the Control Resource Group or its resources can lead to the inability to perform successful recovery.

xDB is excluded while considering the recovery time

The Recovery time needed while doing the failover process does not cover the xDB rebuild due to the significant amount of time it can take for a large content database. If the analytics indexes are not rebuilt this only affects the functionality that depends on lists and does not affect the frontend site.

Customizations

Since our Managed Cloud Containers solution is based on Infrastructure as Code (IaC), it is highly customizable. We are using Terraform & Ansible for infrastructure and application provisioning respectively.

These artifacts are available in customers' Git repository, and we use GitOps practices to manage customer environments. Because of such a customization support for any backup, restore, or replication in DR is subjective, contact us to get more details.

Failover situations not supported

There is a small set of situations where it might not be possible to restore a production site into the secondary data center. For example, when a global Azure service such as authentication or Azure Front Door is down.

Disaster recovery false alarm

Sometimes reboot node is necessary when there are security updates. Rebooting the node will bring the environment down for a short time span which will trigger the automatic DR failover and failback in the DR Managed Hot Standby scenario.

Disaster recovery testing is not supported

Sitecore does not support the testing of Disaster Recovery scenarios for customer implementations at this time.

Region Support

 Geography 

Azure Paired regions

 ElasticSearch 

support

 SearchStax 

support

Limitations & Recommendation

(if there is no support for either ElasticSearch or SearchStax in the secondary region, we can recommend a suitable region)

  North  

  America

East US 2

Central US

-

  ElasticSearch Elastic High IO Configuration mismatch between regions.

  An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  Europe

North Europe (Ireland)

West Europe (Netherlands)

 

  North  

  America

East US

West US

-

   ElasticSearch does not support West US.

  An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  North

  America

North Central US

South Central US

-

  ElasticSearch does not support North Central US.

  An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  North

  America

West US 2

West Central US

-

  ElasticSearch does not support West Central US.

  An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  Brazil

Brazil South

One directional only (Brazil South to South Central US)

South Central US

-

  ElasticSearch does not support Brazil South.

  An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  Japan

Japan East

Japan West

-

  ElasticSearch does not support Japan West. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  There is a limitation for Managed Cloud Containers (MCC) packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Japan West. Optional regions can be recommended or upgrade to a newer MCC version.

  Asia

East Asia

Southeast Asia

-

  ElasticSearch does not support East Asia.

  An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  UK

UK West

UK South

-

  ElasticSearch does not support UK West. An optional region can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support UK South. Optional regions can be recommended or upgrade to a newer MCC version.

  Canada

Canada Central

Canada East

-

  ElasticSearch does not support both Canada Central and Canada East. Optional regions can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Canada East. Optional regions can be recommended to upgrade to a newer MCC version.

  Korea

Korea Central

Korea South

-

  ElasticSearch does not support both Korea Central and Korea South. Optional regions can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Korea South. Optional regions can be recommended to upgrade to a newer MCC version.

  Australia

Australia East

Australia Southeast

-

  ElasticSearch does not support both Australia Southeast. Optional regions can be recommended (with the same Elastic High IO Configuration Id and Elastic Deployment Template Id).

  There is a limitation for MCC packages below versions 10.1.0 (r.0.1.201392), 10.1.1 (r.0.1.201503), 10.1.2 (r.0.1.201468) and 10.2.0 (r.0.1.201465). Public IP with availability zones does not support Australia Southeast. Optional regions can be recommended to upgrade to a newer MCC version.

  United

  Arab

  Emirates

UAE North

UAE Central

-

-

  ElasticSearch does not support both UAE North and UAE Central.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support UAE Central.

  Contact Sitecore to discuss recommended regions.

  Australia

Australia Central

Australia Central 2

-

-

  ElasticSearch does not support both Australia Central and Australia Central 2.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support Australia Central 2.

  Contact Sitecore to discuss recommended regions.

  France

France Central

France South

-

-

  ElasticSearch does not support France South.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support France South.

  Contact Sitecore to discuss recommended regions.

  Germany

Germany West Central

Germany North

-

-

  ElasticSearch does not support both Germany West Central and Germany North.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support Germany North.

  Contact Sitecore to discuss recommended regions.

  Norway

Norway East

Norway West

-

-

  ElasticSearch does not support both Norway East and Norway West.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support Norway West.

  Contact Sitecore to discuss recommended regions.

  Switzerland

Switzerland North

Switzerland West

-

-

  ElasticSearch does not support both Switzerland North and Switzerland West.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support Switzerland West.

  Contact Sitecore to discuss recommended regions.

  India

Central India

South India

-

-

  ElasticSearch does not support both Central India and South India.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support South India.

  Contact Sitecore to discuss recommended regions.

  India

West India

One direction only, West India to South India

South India

-

-

  ElasticSearch does not support both West India and South India.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support both West India & South India.

  Contact Sitecore to discuss recommended regions.

  Brazil

Brazil Southeast

Brazil South

-

-

  ElasticSearch does not support both Brazil Southeast and Brazil South.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support Brazil Southeast.

  Contact Sitecore to discuss recommended regions.

  South

  Africa

South Africa North

South Africa West

-

-

  ElasticSearch does not support both Africa North and South Africa West.

  For secondary ElasticSearch choose a region where ElasticSearch Elastic High IO Configuration Id and Elastic Deployment Template Id are the same as primary ElasticSearch.

  SearchStax does not support South Africa West.

  Contact Sitecore to discuss recommended regions.

  China

China North

China East

-

-

  Not Supported

  China

China North 2

China East 2

-

-

  Not Supported

 

Process Description:

DR Setup

 

Action

Description

1

  Customer Checks prerequisites, Considerations & limitations

  Check if the primary environment meets the prerequisites mentioned.

  Understand the areas in the consideration section that will be required when performing setup, failover, and failback.

  Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, prepare an action plan.

3

  Customer Requests DR Setup

  Create a "DR Basic ColdStanby Setup" Service request.

4

  CloudOps does prerequisite and limitation check

  CloudOps will verify the primary environment for prerequisites, Considerations & limitations before provisioning DR.

5

  CloudOps performs DR Setup

  CloudOps provisions DR.

6

  CloudOps notifies Customer of DR setup status

  CloudOps will communicate and provide updates before, during, and after DR Setup.

8

  Configures Outage App content

  As mentioned in the FAQ, configure the outage page according to the customer's specifications.

  As mentioned in the "Access" item in the Consideration section CloudOps will assist in providing temporary access to the relevant resources.

 

DR Failover for DR Basic Cold Standby

 

Action

Description

1

  CloudOps receives primary region unavailability/disaster alert

  Our monitoring service in the control resource group will generate an alert for CloudOps to take action.

2

  CloudOps notifies Customers of the disaster

 

3

  Customer requests for failover when the customer considers a failover is required

  Create a "DR Basic Cold Standby Failover" Service request.

4

  CloudOps performs failover

 

5

  CloudOps notifies customers of the failover status

 

6

  Optionally, the customer applies custom configurations or provisioning in the secondary environment.

  Understand the areas in the consideration section that will be required after performing failover.

  Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, execute the action plan prepared prior to DR Setup.

  Review the additional support and how it will impact your customization. Draw your attention to the fact that customizations, which have been done on the primary environment and that are not listed in this document, are not supported.

 

DR Failback for DR Basic Cold Standby

 

Action

Description

1

  CloudOps receives primary region availability/recovery alert

  Our monitoring service in the control resource group will generate an alert for CloudOps to take action.

2

  CloudOps notifies Customer of the recovery

 

3

  Customer requests for failback

  Create a "DR Basic ColdStanby failback" Service request.

4

  CloudOps performs a failback

 

5

  CloudOps notifies customers of the failback status

 

DR Failover for DR Managed Hot Standby

 

Action

Description

1

  CloudOps receives primary region unavailability/disaster alert

  Our monitoring service in the control resource group will generate an alert for CloudOps to monitor the failover and take actions when required.

2

  Failover is executed automatically

 

3

 
  CloudOps notifies a customer of the failover status

  

4

  Optionally, a customer applies custom configurations or provisioning in the secondary environment. 

  Understand the areas in the consideration section that will be required after performing failover.

  Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, execute the action plan prepared prior to DR Setup.

  Review additional support and how it will impact your customization. Take note that customizations done on the primary environment that is not listed in this document are not supported.

 

DR Failback for DR Managed Hot Standby

 

Action

Description

1

  CloudOps receives primary region availability/recovery alert

  Our monitoring service in the control resource group will generate an alert for CloudOps to monitor the failover and take actions when required.

2

  Failback is executed automatically

 

3

  CloudOps notifies customers of the failback status

 

FAQ

How do customers request the Managed Cloud Disaster Recovery (DR) feature for Sitecore Managed Cloud environments?
The customer can ask to set up Disaster Recovery for their XP/XM environment through the Sitecore regional office or Sitecore sales team.

What actions do customers need to take after the DR setup has been done?
After the DR setup has been complete, customers are requested to perform the following actions:

The instructions for how to do so are provided by Sitecore engineers after the provision of the DR setup. Alternatively, the customer can raise a support query for detailed information on the Sitecore Support Portal.

What are the new resources that are introduced once the DR setup has been done?
Post provision of DR setup, the customer is able to see the following resource groups according to the chosen DR type:

Do customers have limited access rights to the DR resources?
Sitecore provides limited access to customers on the additional resource groups (Control and Secondary). This helps Sitecore to prevent any changes to the configurations related to backup policies and automation.

How is the paired region chosen for the DR setup?
Sitecore chooses the best-paired region for our customers that complies with Microsoft's standards. More detailed descriptions are provided in the Region Support section.

Can I update the default outage page?
Yes, you can update the outage app by changing the outage app content (index.html) in the infrastructure repository and running the infrastructure pipeline with checking the "Update Outage App" input. More detailed instructions are provided on the Outage page under the consideration section.

Will everything from my Primary resource group be available after the Failover?
No, we will restore only Standard Managed Cloud Containers resources. Review the Limitation section, especially the customization section for the current limitations of HADR.

Do the custom domains and SSL bindings replicate from the Primary environment during the failover?
Since we are configuring domains and SSL bindings in Azure Front Door and Azure Front Door is globally available, there is no need to replicate these configurations.

What is a management certificate?
It is the certificate that is used while provisioning Sitecore.

Purpose of a Management Certificate – The Sitecore Azure module is based on the Microsoft Azure Service Management REST API. All API operations are performed over SSL and are mutually authenticated using the X.509 v4 certificate. Therefore, a management certificate must be uploaded to the module.

What is the procedure of enabling DR for SolrCloud?
Sitecore follows the following procedures while enabling Disaster Recovery setup for Managed Cloud customers who have purchased Managed Cloud instances with SolrCloud, to provide DR availability for both.

How do customers view DR environment data in Grafana?
After a failover, customers can view DR/Secondary Sitecore related data in the DR Grafana. This is a condition by design, where primary Grafana and DR/secondary Grafana data are not stored together.