xDB Cloud Service unavailability might cause Sitecore XP downtimes


Description

On 30 January 2016, Sitecore xDB Cloud Service experienced an intermittent infrastructure outage that affected a limited number of customers. The service has now been stabilized.

This article describes the current status of the issue, and we recommend that Sitecore customers review it to get the latest information.

Impact

The outage affected a limited number of Sitecore sites immediately after a planned or unplanned web application restart.

When the Sitecore site was restarted and xDB Cloud Service was not available at that time, a Sitecore instance attempting to connect to it would have been continuously restarting itself, effectively causing site downtimes.

Sitecore sites that were not restarted at that time of the xDB Cloud outage were not affected.

Identification

When Sitecore site downtimes were observed that were caused by xDB Cloud Service unavailability, one or more of the following errors would have appeared in Sitecore log files:

xDB Cloud - Exception during initializing occurred
System.AggregateException: One or more errors occurred. ---> Sitecore.Cloud.RestClient.RestRequestException: https://discovery-xdb-cloud.sitecore.net/xdb/set/LicenseID?DeploymentId=DeploymentID failed ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sitecore.Cloud.RestClient.HttpClientTransport.<SendRequestAsync>d__1.MoveNext() 
[Exception: Connection string 'analytics' could not be found.]
xDB Cloud - Get xDB-set with License Id: 'LicenseID' - Deployment Id: 'DeploymentID' Attempt 5 of 5
Sitecore shutting down
Shutdown message: Initialization Error
HostingEnvironment initiated shutdown
xDB Cloud - xDB Cloud initialization failed. Please contact cloud@sitecore.net and include this in the email:
**********************************************************************************
License Id: LicenseId
Deployment Id: DeploymentId
Issue id: IssueId
Discovery Service Status Code: 500 InternalServerError

Exception Details: Sitecore.Cloud.Xdb.Exceptions.DiscoveryServiceException: Exception of type 'Sitecore.Cloud.Xdb.Exceptions.DiscoveryServiceException' was thrown.
   at Sitecore.Cloud.Xdb.DiscoveryServiceClient.AssertStatusCodes(IRestResponse restResponse, String licenseId, String deploymentId)
   at Sitecore.Cloud.Xdb.DiscoveryServiceClient.GetXdbSet(String licenseId, String deploymentId, String sitecoreVersion, DeploymentType deploymentType)
   at Sitecore.Cloud.Xdb.DiscoveryServiceClient.GetXdbSet()
   at Sitecore.Cloud.Xdb.UpdateXdbConnectionStrings.Process(PipelineArgs args)

Solution

The Sitecore xDB Cloud Service has been stabilized.

For Sitecore customers using xDB Cloud, Sitecore recommends installing an update below to reduce the impact of connectivity issues or xDB Cloud outages and prevent related web site downtimes:

Customers who applied the workaround previously provided in this article can safely revert associated changes.

To do this, it is necessary to remove fake connection strings with the names analytics, tracking.live, tracking.history, tracking.contact from the /App_Config/ConnectionStrings.config file. Leaving the fake connection strings does not affect the functionality of the Sitecore site.

The problem has now been fixed on the server side.

Root Cause Analysis

The outage was caused by infrastructure problems resulting from the overall growth of Sitecore xDB Cloud usage. This specifically impacted a component of the xDB Cloud Service called xDB Cloud Discovery Service.

xDB Cloud Discovery Service is responsible for providing MongoDB connection information to Sitecore sites that use xDB Cloud.

When not able to connect to xDB Cloud while configured to do so, Sitecore sites experienced a fatal error and restarted the web application.

To stabilize the xDB Cloud Discovery Service, enhancements to both Sitecore's cloud infrastructure and the service itself have been implemented.

To prevent connectivity issues to xDB Cloud Discovery Service from causing Sitecore site outages, an update to Sitecore xDB Cloud Client component has been released. This update should be installed by all xDB Cloud customers.