The crawler root item is ignored on search index updates


Description

When performing incremental updates of search indexes, Sitecore may include content items that are not supposed to be present in the index.

Specifically, Sitecore may ignore the crawler root configuration value for a particular index.

For example, given the following configuration for sitecore_master_index, items that are outside of the /sitecore/content tree may appear in the index:

<index id="sitecore_master_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
  ...
  <locations hint="list:AddCrawler">
    <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
      <Database>master</Database>
      <Root>/sitecore/content</Root>
    </crawler>
  </locations>
</index>

The cause of the issue is that the the crawler root item specified for the index is ignored during index update operations.

Please note that this issue does not occur when a full search index rebuild is performed.

Solution

To resolve the issue, perform the following steps:

  1. Place the Sitecore.Support.406670.dll assembly in the \bin folder.
  2. In configuration files, replace all occurrences of the following string:
    Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch
    with this string:
    Sitecore.Support.ContentSearch.SitecoreItemCrawler, Sitecore.Support.406670
    For example, the sitecore_master_index index configuration would look like this:
    <index id="sitecore_master_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
      ...
      <locations hint="list:AddCrawler">
        <crawler type="Sitecore.Support.ContentSearch.SitecoreItemCrawler, Sitecore.Support.406670">
          <Database>master</Database>
          <Root>/sitecore/content</Root>
        </crawler>
      </locations>
    </index>