Product DocsMenu

Query Correction Feature

The Coveo index includes the automatic query correction or Did You Mean feature used to detect and automatically suggest or correct misspelled keywords (see How Are Misspelled Words Handled?). This topic describes in more detail how this feature works so that you can better understand what it can and cannot do.

Query correction feature facts:

  • The query correction is based on a word corrector lexicon (WCL) that contains frequent words and their number of occurrences gathered when documents are indexed, so the spelling suggestions/corrections are based on the index content, not on predefined or custom dictionaries. You can however influence the lexicon algorithm (see Influencing the Word Corrector Lexicon Algorithm ).

    Tip: You can manually update the WCL from the Administration Tool by clicking the Rebuild Word Corrector Lexicon link (see Modifying or Using Advanced Index Parameters).

  • The query correction suggestions/corrections improve as the size of the index increases.

  • The index must have a minimum size of 2000 documents to start providing query correction suggestions.

  • The query correction algorithm is triggered when the query returns a low number of results relative to the size of the index.

    Note: Query correction suggestions are provided when the index:

    • Contains between 2000-10000 documents and returns less than 1000 results following a user query.

    • Contains between 10000-50000 documents and returns less than 1250 results following a user query.

    • Contains more than 50000 documents and returns less than 0.75% of its results following a user query.

      Example: For a 1 million documents index, the query must return less than 7500 results for suggestions to be provided.

  • Suggestions are not provided if the query has been expanded by the thesaurus.

    Note: The query correction and the thesaurus are completely independent features (see Thesaurus Best Practices).

  • The algorithm is not applied to search terms meeting one or more of the following rules:

    • Containing 3 characters or less

    • Containing a wildcard character (* and ?)

    • Beginning with a number

  • An indexed word is not suggested by the word corrector lexicon if the word meets one or more of the following rejection rules:

    • Containing more than 4 numbers.

    • Containing 7 or more consecutive consonants

    • Containing 6 or more consecutive vowels

    • Containing an invalid number of consecutive vowels considering the document language.

      Note: The rule applies only to the following languages: English, French, Spanish, and German.

    Note: These word rejection rules are all active by default, but they can be turned off independently to fine tune the query correction behavior. Contact Coveo Support for assistance if you want to do that.

  • The query correction is done on a word by word basis, so the correction of a word is not modified by other words in the query.

  • A suggested word must have a high degree of similarity (edit distance) with the searched word, i.e., a minimized number of character permutations differentiating it from the original word. A missing or added character is considered a permutation. The edit distance for compatible permutations (such as k replaced by q) is smaller than for an incompatible permutation (such as x replaced by r).

  • For a word to be suggested, for each permutation, it must have a number of occurrences in the index that is at least an order of magnitude greater than the original word.

    Example: A user inverts two characters in a keyword such as typing enterpirse rather than correct enterprise. Because there are two permutations between the wrong and correct spelling, enterprise must have at least two order of magnitudes (100 times) more occurrences in the index than enterpirse to be suggested as a correction.

  • The suggested spelling of a query word is determined based on both the frequency of the alternative words in the lexicon (the higher the better) and their degree of similarity with the original word (the closer the better). Thus, with two alternative spellings having the same edit distance, the word that is more frequent in the index is suggested.

  • As an administrator, you can configure how the search interfaces take advantage of the query corrections:

People who viewed this topic also viewed