SPIF fails to synchronize documents due to invalid characters in metadata


Description

Sitecore SharePoint Integration Framework may fail to synchronize SharePoint Document Library when it contains documents with the invalid XML 1.0 characters in any of the metadata fields. Examples of the invalid XML 1.0 characters are "" or "".

The following exception can be found in the log files when this happens:

ManagedPoolThread #10 00:00:00 ERROR Sharepoint Provider can't process tree.
Integration config item ID: {23C8604D-6761-4FD8-B3AD-A0E3BB880AD8}, Web:http://sharepoint.test.com/ List: {7C8A645E-A085-46DE-A059-168B8F480641}
Exception: System.InvalidOperationException
Message: There is an error in XML document (44, 2760).
Source: System.Xml
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
   at Sitecore.Sharepoint.Data.WebServices.SharepointLists.Lists.GetListItems(String listName, String viewName, XmlNode query, XmlNode viewFields, String rowLimit, XmlNode queryOptions, String webID)
   at Sitecore.Sharepoint.ObjectModel.Connectors.ItemCollectionConnector.GetItems(BaseList list, ItemsRetrievingOptions options)
   at Sitecore.Sharepoint.ObjectModel.Entities.Collections.ItemCollection.GetEntities()
   at Sitecore.Sharepoint.Data.Providers.SharepointProvider.ProcessTree(ProcessIntegrationItemsOptions processIntegrationItemsOptions, SynchContext synchContext)
   at Sitecore.Sharepoint.Data.Providers.SharepointProvider.ProcessTree(ProcessIntegrationItemsOptions processIntegrationItemsOptions, Item integrationConfigDataSource)
 
Nested Exception
 
Exception: System.Xml.XmlException
Message: '[1]', hexadecimal value 0x02, is an invalid character. Line 44, position 2760.
Source: System.Xml
   at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
   at System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32 startPos, Boolean expand, StringBuilder internalSubsetBuilder, Int32& charCount, EntityType& entityType)
   at System.Xml.XmlTextReaderImpl.ParseNumericCharRef(Boolean expand, StringBuilder internalSubsetBuilder, EntityType& entityType)
   at System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, Int32& charRefEndPos)
   at System.Xml.XmlTextReaderImpl.ParseAttributeValueSlow(Int32 curPos, Char quoteChar, NodeData attr)
   at System.Xml.XmlTextReaderImpl.ParseAttributes()
   at System.Xml.XmlTextReaderImpl.ParseElement()
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
   at System.Xml.XmlLoader.ReadCurrentNode(XmlDocument doc, XmlReader reader)
   at System.Xml.XmlDocument.ReadNode(XmlReader reader)
   at System.Xml.Serialization.XmlSerializationReader.ReadXmlNode(Boolean wrapped)
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderLists.Read30_GetListItemsResponse()
   at Microsoft.Xml.Serialization.GeneratedAssembly.ArrayOfObjectSerializer53.Deserialize(XmlSerializationReader reader)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)

Solution

Use the following patch steps to remove invalid characters from SharePoint Documents during synchronization to Sitecore:
  1. Place the attached Sitecore.Support.411647.dll file to the /bin folder.
  2. Modify the web.config file by adding the following configuration into the <system.web> section:
<webServices> 
  <soapExtensionTypes> 
    <add type="Sitecore.Support.SharePoint.XmlCleanupSoapExtension, Sitecore.Support.411647" priority="1" group="0" /> 
  </soapExtensionTypes> 
</webServices>