%kb_name - %short_descr - Knowledge Portal

Description

Prior to Sitecore XP 9.1 Initial Release the media indexing feature was based on iFilters, which are not supported in Azure Web Apps. As a result, the following exception is thrown when indexing a media item, which contains a blob with PDF file:

ERROR Could not compute value for ComputedIndexField: _content for indexable: sitecore://master/{8EEE161B-F7D1-4339-AE77-1FA10B8CF8D2}?lang=en&ver=1
Exception: System.Runtime.InteropServices.COMException
Message: Exception from HRESULT: 0x80048605
Source: Sitecore.ContentSearch
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.IPersistStream.Load(IStream stream)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.InitializeFilterAsPersistStream(IFilter filter, String fileName)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
   at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.Azure.CloudSearchDocumentBuilder.AddComputedIndexFields()

Solution

To resolve the issue, consider one of the following options:

Upgrade Sitecore XP instance to a newer version.
Disable media files content extraction downloading and installing the patch compatible with the affected product version:
https://github.com/SitecoreSupport/Sitecore.Support.149909/releases

Note

Starting from Sitecore XP 9.1 Initial Release content of PDF files is extracted using PDFsharp third-party library. The library has some limitations that may lead to Hexadecimal value is an invalid character during media indexing.

Indexing PDF files in Azure Web Apps

Description

Solution

Note