How to order search results by a tokenized field


Description

Search results are ordered by relevance by default. This means that the better a document matches the search query, the higher it appears in the search results. In some cases, you might want to order search results by a specific field.

Example 1 - sorting by an item field:

We can use the Title field as an example. Title is a default Sitecore XP field. It is defined in Solr configuration as a text field:

<fieldNames hint="raw:AddFieldByFieldName">
  <field fieldName="title" returnType="text" />
</fieldNames>

Text fields are mapped to dynamic Solr fields:

<typeMatches hint="raw:AddTypeMatch">
    <typeMatch typeName="text" type="System.String" fieldNameFormat="{0}_t" cultureFormat="_{1}"  
     settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
</typematches>

Dynamic fields are defined in the Solr schema file and are mapped to the text_general field type (or a culture-specific equivalent):

<dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_t_en" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_t_da" type="text_da" indexed="true" stored="true"/>

Example 2 - sorting by a special field:

Special index fields are based on Sitecore XP system fields. They usually have one or two underscores at the beginning of the field name. Such fields are defined directly in the Solr schema, for example:

<field name="_name" type="text_general" indexed="true" stored="true"/>

The challenge

In both cases, the field type definition does not include docValues and uses StandardTokenizerFactory. Tokenizers split the input value into terms. The StandardTokenizerFactory usually produces multiple terms per field:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

The Solr search engine cannot guarantee consistent sorting order if field content is parsed into tokens: https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-ThesortParameter.

Solution

Option 1 - computed index field:

  1. Create a custom computed index field that returns the value of the field that will be used for sorting.
  2. Update search index configuration to include the computed index field. Make sure to use string as a return type.
    <fields hint="raw:AddComputedIndexField">
      <field fieldName="name" returnType="string">Custom.Assembly.ComputedFields.Name,
    Custom.Assembly</field> </fields>
  3. Rebuild the search index.

Option 2 - Solr copy field:

  1. Add a field that will be used for sorting, for example:
    <field name="name" type="lowercase" indexed="true" stored="false"/>
  2. Modify Solr schema to include a copy field, for example:
    <copyField source="_name" dest="name" />
  3. Reload Solr core for the modified schema to take effect.
  4. Restart Sitecore XP to reload the schema from Solr.
  5. Rebuild the search index.