Troubleshooting Apache Solr extraction module susceptible to XXE attacks via XFA content in PDFs


Description

The article provides a troubleshooting scenario for the following issue: Apache Solr extraction module vulnerable to XXE attacks via XFA content in PDFs. This issue pertains to Apache Solr extraction module (Solr Cell) in versions 6.2 – 9.x and only arises if the Solr Cell extraction handler (/update/extract) is enabled and actively used for PDF content extraction.

Important: The issue affects Sitecore XP, XM, and XC (on-prem or Managed Cloud) only when Apache Solr is configured with the extraction module (Solr Cell) and processes untrusted PDFs. Standard Solr text-only indexing is not impacted.

Solution

To confirm that the solution is affected by this particular issue, you must verify that Solr Cell PDF extraction is enabled in your Solr configuration by taking these steps:

  1. Open the solrconfig.xml for each Sitecore Solr core on the Solr server, for example: [solr_root]\server\solr\sitecore_web_index\conf\solrconfig.xml
  2. In each solrconfig.xml, search for the Solr Cell extraction handler: requestHandler name="/update/extract"

Note: Solr "collections" are also affected under the same conditions if the Solr server is configured running in the SolrCloud mode.

If confirmed, to mitigate the issue, follow the instructions provided here.