Prior to Sitecore XP 9.1 Initial Release the media indexing feature was based on iFilters, which are not supported in Azure Web Apps. As a result, the following exception is thrown when indexing a media item, which contains a blob with PDF file:
ERROR Could not compute value for ComputedIndexField: _content for indexable: sitecore://master/{8EEE161B-F7D1-4339-AE77-1FA10B8CF8D2}?lang=en&ver=1
Exception: System.Runtime.InteropServices.COMException
Message: Exception from HRESULT: 0x80048605
Source: Sitecore.ContentSearch
at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.IPersistStream.Load(IStream stream)
at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.InitializeFilterAsPersistStream(IFilter filter, String fileName)
at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
at Sitecore.ContentSearch.Azure.CloudSearchDocumentBuilder.AddComputedIndexFields()
To resolve the issue, consider one of the following options:
Starting from Sitecore XP 9.1 Initial Release content of PDF files is extracted using PDFsharp third-party library. The library has some limitations that may lead to
Hexadecimal value is an invalid character during media indexing.