The following Disaster Recovery (DR) types will be changed to:
Take note that the support for the new terminology is currently ongoing. The deprecation of the old DR types will take effect once we have migrated the existing customers’ DR environments.
The table below shows the mapping between the old and new DR Types.
Old | New | Notes |
Basic | DR Basic Cold Standby | Upgrade |
Basic Geo-Replication | DR Basic Cold Standby | Exact match |
Hot-Warm | DR Managed Hot Standby | Upgrade |
Hot Manual | DR Managed Hot Standby | Upgrade |
Hot Auto | DR Managed Hot Standby | Exact match |
The Sitecore Managed Cloud Disaster Recovery feature allows customers to maintain or quickly resume mission-critical functions following a disaster, therefore supporting the customer's business continuity plan. When a disaster occurs in a region containing the production environment (primary), the Disaster Recovery tool allows the environment to be recovered into another region (secondary) or a disaster recovery site.
Sitecore currently provides two disaster recovery options:
This article provides information on the disaster recovery configurations, workflows, and architectural aspects to be aware of.
When a disaster happens, Sitecore must receive an alert within 15 minutes. On the basis of the alert, Sitecore validates the authenticity of the alert and creates a support ticket to investigate the issue, and informs the customer about the initial investigation. If the issue turns out to be the result of any kind of disaster that means that the primary resource group cannot be recovered temporarily, Sitecore will start the failover process, after approval from the customer or without approval from the customer, according to Service type. The failover process provides the customer with a secondary environment with which they can continue business-critical activities until the primary environment becomes available.
Notes:
The following prerequisites are common for all the disaster recovery options:
Sitecore has two Disaster Recovery features:
Basis Cold Standby Disaster Recovery
The DR Basic Cold Standby recovery option takes a longer recovery time. This is because the secondary Sitecore XP/XM environment is created as part of the failover process.Basic Cold Standby Disaster Recovery includes such options as Geo-Replication
DR Basic Cold Standby Geo-Replication creates a continuous copy of the database in the secondary region. In the event of a disaster, we can simply fail over to the secondary region and bring up the database with the minimal downtime.
Managed Hot Standby Disaster Recovery
The DR Managed Hot Standby recovery option has the quickest recovery time compared to DR Basic Cold Standby disaster recovery configurations.
This is because the secondary Sitecore XP/XM environment has been provisioned during setup.
During the failover, the web apps are started and the endpoints in the traffic manager will be switched to bring the service back online.
You can raise a support query for detailed information on specific Disaster Recovery features on the Sitecore Support Portal.
Disaster recovery introduces some new considerations when you build a Sitecore XP/XM solution. This section tries to address some of the most common ones.
SQL Server geo-replication and failover group
Disaster Recovery uses Azure's Geo-replication for SQL Server. There is a limitation where databases cannot be added to multiple failover groups. Therefore, the existing failover groups need to be removed.
This is applicable for customers with a Sitecore environment that uses failover groups for their primary SQL server databases before setting up the disaster recovery.
Make sure you have updated and verified the below steps before proceeding with the Disaster Recovery setup:
Choosing your Azure Region
Azure organizes its data centers into regions with a latency-defined perimeter and is connected through a dedicated regional low-latency network. When choosing a secondary data center, we recommend choosing one in the same region as the primary, to ensure fast backups and consistent customer delivery speeds. To find compatible regions, see the article here.
Azure Region for Control Resource Group
Note that there is no strict or specific rule to select the region for the Control Resource Group. There are considerations as below:
Third-party service APIs
If the Sitecore implementation is using any third-party service APIs that limit access based on IP, then it is essential to register the IPs of the secondary data center with the service. Failure to register the IPs could result in a delay to bring the secondary Sitecore environment online.
Outage page
Managed Cloud uses Azure Functions to serve an outage page in case of an outage. Using Azure Functions means the outage page will return a 503 code to indicate the service is unavailable. We recommend that the outage page only contains the necessary information to assure customers that the site will be back online soon, for example:
The outage page is simply constructed with the basic text by default, you can request temporary access (by creating a support case) to the Outage app to customize your outage app based on your requirements.
Access
We can grant temporary access to the Traffic Manager, Function App (serves the outage page), and Secondary Resource Group if you would like to update your resources with a custom configuration, custom domains, and so on.
This section describes limitations to the Disaster Recovery options provided by Managed Cloud.
No removal of Control Resource Group
The Control Resource Group contains all resources used to restore the Sitecore XP/XM environment successfully in a secondary data center. Deleting the Control Resource Group or its resources can lead to the inability to perform successful recovery.
List of files that are excluded while performing backup for Disaster recovery
DR setup configures a backup process to back up all the web apps to meet DR failover needs. In order to achieve this, there are certain files in the primary web apps that are excluded from the backup. The following table describes the exclusion (applicable for Sitecore XP/XM 9.1 Initial Release and above):
File | Topology | Roles | Details |
\site\wwwroot\App_Data\logs \site\wwwroot\App_Data\debug \site\wwwroot\App_Data\diagnostics \site\wwwroot\App_Data\MediaCache \site\wwwroot\App_Data\packages \site\wwwroot\App_Data\viewstate \site\wwwroot\App_Data\DeviceDetection \site\wwwroot\temp | * | * |
Temp/log files. |
\site\wwwroot\bin\Feature.HADR_PublishAPI.dll \site\wwwroot\bin\Foundation.HADR_WebApi.dll | * | CM | HADR related API files. |
xDB is excluded while considering the recovery time
The Recovery time needed while doing the failover process does not cover the xDB rebuild due to the significant amount of time it can take for a large content database. If the analytics indexes are not rebuilt this should only affect the functionality that depends on lists (for example, EXM) and should not affect the frontend site.
No recovery of certain infrastructure customizations during failover
Additional Azure resources or services that are added to the environment and are not part of the standard Managed Cloud topology are not recovered during the failover process. These components must be added separately to the secondary environment after the failover process has been completed.
xConnect Search Indexer
Sitecore XP can only have one active xConnect Search Indexer WebJob across a solution. In case of any failover and restoration of service, the indexer must be shut down.
Azure requirements and cost considerations
All disaster recovery options are dependent on Azure WebApp Backup and Traffic Manager, which require a minimum of the Standard Tier for WebApps.
Failover situations are not supported
There is a small set of situations where it might not be possible to restore a production site into the secondary data center. For example, when a global Azure service such as authentication or Traffic Manager is down.
Azure Service Bus is not supported
HADR does not support Azure Service Bus Synchronization, Backup/Restore, or Replication. This is applicable for Sitecore XP/XM 9.2.0 and Sitecore XP/XM 9.3.0 only.
Customized resources are not supported
Custom resources and resource configurations other than standard Sitecore Managed Cloud Standard resources are not supported. This includes WAF, AFD, Traffic Manager, Storage Account, and SQL elastic pool in the primary resource group. Custom synchronization, Backup, Restore, or Replication are not supported.
Custom domains or SSL binding are not updated
While performing failover, HADR does not update the custom domains or SSL binding on the Outage app or on the secondary web app.
Disaster recovery testing is not supported
Sitecore does not support the testing of Disaster Recovery scenarios for customer implementations at this time.
Sitecore in Managed Cloud provides options to include modules and configure additional services. However, the Sitecore Disaster Recovery solution does not provide full support to the modules out of the box. Below are the details related to the disaster recovery support nature for modules and configurations.
SXA is support by default by Disaster Recovery. SXA related configuration are synchronized to the secondary environment.
JSS is support by default by Disaster Recovery. JSS related binaries, configurations and items are synchronized to the secondary environment.
Modules listed below are not supported by default and need to be configured manually for secondary.
These modules that have been installed during the primary Sitecore XP/XM installation (that is, the primary environment of the customer) cannot be recovered during the failover process. These modules must be added separately after the failover process has been completed.
Ensuring the functionality of custom configuration as well as installed modules is not in the scope of the disaster recovery service. The custom solution should be designed so as to tolerate and handle the disaster recovery service steps that are described in the documentation.
Reverse proxy is partially supported as our disaster recovery solution synchronizes most of the configuration from the primary environment.
In order to enable reverse proxy in secondary CD role, create the \home\site\applicationHost.xdt in the secondary similar to what has been created in the primary CD Role.
Custom web app inclusion in backup/restore
Custom webapps are webapps that are not Sitecore roles that are not provisioned by default, i.e webapps that are created by customers or external vendors.
HADR will manage auto-scale settings, webapp service plan SKU, Azure level application settings and connection string, and web content synchronization (similar to Sitecore roles).
The content synchronization will not modify environment-based settings such as connectionstring.config, appsetting.config, and so on.
A user will have to modify it after the failover. A User can also modify _backup.filter after setup in order to achieve backup exclusion.
Prerequisites:
Custom Database Names
By default, DR supports the database names provisioned via Azure Marketplace or Sitecore Azure Toolkit with the predefined names.
In addition, DR also supports databases with custom names that follow the restriction provided by Azure at:
Resource naming restrictions - Azure Resource Manager | Microsoft Docs
Action |
Description | |
---|---|---|
1 |
Customer Checks prerequisites, Considerations, limitations, and additional support |
Check if the primary environment meets the prerequisites mentioned. Understand the areas in the consideration section that will be required when performing setup, failover, and failback. Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, prepare an action plan. Review the additional support and how it will impact your customization. Take note that customizations done on the primary environment that are not listed in this document are not supported by DR. |
2 |
Customer Requests DR Setup |
Create a "DR Basic ColdStanby Setup" Service request or Create a "DR Managed HotStandby Setup" service request |
3 |
CloudOps does prerequisite and limitation check |
CloudOps will verify the primary environment for prerequisites, Considerations and limitations before provisioning DR |
4 |
CloudOps performs DR Setup |
CloudOps provisions DR |
5 |
CloudOps notifies Customer of DR setup status |
CloudOps will communicate and provide updates before, during, and after DR Setup |
6 |
Customer performs relevant DNS configuration DNS Provider and Traffic Manager |
As mentioned in the FAQ, the customer configures the custom domain of the CD instance to point to the DNS name of the traffic manager using a DNS CNAME record. As mentioned in the "Access" item in the Consideration section CloudOps will assist in providing temporary access to the relevant resources. CloudOps may assist in the configuration and verification process. |
7 |
Configures Outage App content |
As mentioned in the FAQ, configure the outage page according to the customer's specifications. As mentioned in the "Access" item in the Consideration section CloudOps will assist in providing temporary access to the relevant resources. |
|
Action |
Description |
---|---|---|
1 |
Sitecore Support receives primary region unavailability/disaster alert. |
Our monitoring service in the control resource group will generate an alert for Sitecore Support to take action. |
2 |
Sitecore Support notifies Customers of the disaster. |
Customer receives notification with details of the disaster. |
3 |
Customer request for failover when the customer considers a failover is required Customer provides approval for failover activity by creating a support case. |
Customer provides approval for failover activity by creating a support case. |
4 |
Sitecore Support performs failover and notifies customer on the failover status. |
Status update is provided via the created support case |
5 |
Optionally, the customer applies custom configurations or provisioning in the secondary environment. |
Understand areas in the consideration section that will be required after performing failover. Understand the limitations listed in the document. If the limitations have an impact on the environment, execute the action plan prepared prior to DR Setup. Review the additional support and how it will impact your customization. Take note that customizations done on the primary environment that are not listed in this document are not supported. |
|
Action |
Description |
---|---|---|
1 |
Sitecore Support receives primary region availability/recovery alert. |
Our monitoring service in the control resource group will generate an alert for Sitecore Support to take action. |
2 |
Sitecore Support notifies Customer of the recovery and request for approval for failback activity. |
Notification is via a support case created during failover activity. |
3 |
Customer provides approval for failback. |
Approval provided via support case created during failover activity. |
4 |
Sitecore Support performs failback and notifies customer on the failback status. |
Status update is provided via the created support case. |
Action | Description | |
---|---|---|
1 |
CloudOps receives primary region unavailability/disaster alert. |
Our monitoring service in the control resource group will generate an alert for CloudOps to monitor the failover and take actions when required. |
2 |
Failover is executed automatically. |
|
3 |
CloudOps notifies customers of the failover status. |
|
4 |
Optionally, the customer applies custom configurations or provisioning in the secondary environment. |
Understand the areas in the consideration section that will be required after performing the failover. Understand the limitations listed in the document. If the limitation(s) has an impact on the environment, execute the action plan prepared prior to DR Setup. Review the additional support and how it will impact your customization. Take note that customizations done on the primary environment that is not listed in this document are not supported. |
Action |
Description | |
---|---|---|
1 |
CloudOps receives primary region availability/recovery alert. |
Our monitoring service in the control resource group will generate an alert for CloudOps to monitor the failover and take actions when required. |
2 |
Failback is executed automatically. |
|
3 |
CloudOps notifies customers of the failback status. |
|
These are the changes that will be applied to the Primary Sitecore environment during DR Setup. They are required to enable smooth DR operations such as failover and failback.
Database connection strings in Sitecore web apps
The data source of the connection string will be changed from the primary SQL server to the Failover Group endpoint during DR setup. This is used to enable the capability to perform failover.Sitecore roles related changes
Database connection strings are found in App_Config\ConnectionStrings.config.For example,
<add name="security" connectionString="Data Source=primary-sql.database.windows.net" />will be changed to
<add name="security" connectionString="Data Source=primary-fg.database.windows.net" />
Identity Server
The identity server role in both XP and XM topology will get updated in Config\production\Sitecore.IdentityServer.Host.xml file at \Settings\Sitecore\IdentityServer\SitecoreMembershipOptions\ConnectionString.Additional update
Database connection strings in cortex-processing, ma-ops, xc-search roles in Sitecore version 9.1.0 and above will be updated. These are applied to App_Data\jobs\continuous<specific-name>\App_Config\ConnectionStrings.config.ProcessingEngine for cortex-processing AutomationEngine for ma-ops IndexWorker for xc-search
Azure Search
The connection string will consist of primary and secondary Azure Search URL.The connection string for Azure Search (cloud.search) will get updated in App_Config\ConnectionStrings.config.
For example,
<add name="cloud.search" connectionString="serviceUrl=https://primary-as.search.windows.net;apiVersion=2017-11-11;apiKey=F377288DE1D8549E5338AEA836DF7BE6" />will be updated to
<add name="cloud.search" connectionString="serviceUrl=https://primary-as.search.windows.net;apiVersion=2017-11-11;apiKey=abc123|serviceUrl=https://secondary-as.search.windows.net;apiVersion=2017-11-11;apiKey=abc1234" />
Hotfix Patching
For CD, CM, rep and prc roles in Sitecore version 9.1.0, 9.1.1 and 9.2.0 will have a hotfix patch (Sitecore.ContentSearch.Azure.dll) in site\wwwroot\bin.IndexAPI config
Below files will be uploaded to the primary CM role:WebApi.config will be uploaded in site\wwwroot\App_Config\Include\Sitecore.ContentIndexing.WebApi.
Sitecore.ContentIndexing.WebApi.dll will be uploaded in site\wwwroot\bin. This is for DR use only.
How do customers request the Managed Cloud Disaster Recovery (DR) feature for Sitecore Managed Cloud environments?
The customer can ask to set up Disaster Recovery for their XP/XM environment through the Sitecore regional office or Sitecore sales team.
What actions do customers need to take after the DR setup has been done?
After the DR setup has been complete, customers are requested to perform the following actions:
The instructions for how to do so are provided by Sitecore engineers after the provision of the DR setup. Alternatively, the customer can raise a support query for detailed information on the Sitecore Support Portal.
What are the new resources that are introduced once the DR setup has been done?
Post provision of DR setup, the customer is able to see the following resource groups according to the chosen DR type:
Do customers have limited access rights to the DR resources?
Sitecore provides limited access to customers on additional resource groups (Control and Secondary). This helps Sitecore to prevent any changes to the configurations related to backup policies and automation.
How is the paired region chosen for the DR setup?
Sitecore chooses the best-paired region for our customers that complies with Microsoft's standards. More detailed descriptions are provided here.
Can I update the default outage page?
Yes, you can request temporary access to the Outage function app (with a Sitecore support case) and update your default outage page.
Will everything from my Primary resource group be available after the Failover?
No, we will restore only Standard Managed Cloud resources (Webapps, SQL DB, Search service). Review the limitation section for the current limitations of HADR.
Do the custom domains and SSL bindings replicate from the Primary environment during the failover?
No, all the required custom domains and SSL binding must be added by the customer to the secondary resources by requesting temporary access (with a Sitecore support case).
What is the procedure of enabling DR for SolrCloud?
Sitecore follows the following procedures while enabling Disaster Recovery setup for Managed Cloud customers who have purchased Managed Cloud instances with SolrCloud, to provide DR availability for both.
Why are secondary web apps stopped after DR is provisioned?
Sitecore has underlying services that would access and update information in the databases. Because primary Sitecore roles are alive and performing these actions, we want to avoid secondary roles attempting any processing and/or updating information in the databases.