Product DocsMenu

Index Ranking Phases

The CES ranking engine is the component responsible for the ordering of query results. Basically, it makes sure the most relevant results are shown before less relevant ones based on your settings.

This mechanism behind the ranking process can be compared to a funnel. Starting with all documents, the index receives a query from a user, isolates documents in which the user identity can be found in the permission groups (see Security Control Levels in CES and Document Permissions), and then only keeps the documents that match the query.

The ranking process is separated into five phases, each of them working on the documents sorted by the preceding phase.

CES natively uses 16 pre-tuned ranking weight factors during these phases. For example, among the most important ones, the criteria with the biggest relevance impact are term proximity, document modified date (most recent), and term frequency. Each of these 16 criteria has been optimized over years of experience with a wide variety of indexed content to determine highly satisfying out-of-the-box relevance scores of documents in most cases. You can still carefully tune these parameters when needed (see Customizing the Ranking for a .NET Search Interface). You can also troubleshoot ranking when a factor score seems too high or too low (see Troubleshooting Ranking).

Important: While you can use several parameters to tune the index ranking engine, you must make changes carefully to prevent negative performance or ranking collateral effects. It is recommended to contact Coveo Support to get recommendations to address your index ranking issues.

Phase

Ranking factor
(UI label)

Applies to n best ranked documents from the previous phase
1 Term in title (Title) All matching security and query
Term in concepts (Concept)
Term in summary (Summary)
Terms in address (URI)
Term has formatting (Formatted)
Term casing (Casing)
Term correlation within stemming classes (Relation)
2 Documents modified recently (Date) 50 0002
Document quality evaluation (Quality)
Document in user language (QRE)
Document title match (Title)1
Source rating (Source)
Custom ranking weight (Custom)
3 Collaborative rating weight (Collaborative Rating) 1002
4 Term frequency (Frequency) 1002
5 Term proximity (Adjacency) 1002

1 Not configurable in the UI.

2 Default value that is configurable.

Note: The relative importance of each of the ranking criteria is difficult to establish, since each criteria score depends on many factors, such as the number of terms in the query, the type of sources that are indexed, the individual terms in the query and the number of documents in the index.

Phase 1: Term weighting

The first phase attributes a score to documents based on each term of the user query. Seven factors are used to rank the indexed documents the user has permissions to access and match the query. These factors cover areas such as the position of the query terms in those documents (in the title, in the summary, in the concepts, etc.). Once the ranking is done, the 50,000 highest scored documents are kept.

CES 7.0.6339+ (January 2014) On top of these ranking factors, query ranking expressions (QRE), which are custom expressions used to modify the ranking score by a specified amount when documents match certain conditions, are taken into account during this phase (see What Are Query Ranking Expressions?).

Notes:

  • A CES administrator can fine-tune the importance of each of the factors, but this should be done with care because it affects all results in all search interfaces (see Customizing Search Results Ranking).

  • For each document, the score attributed for each factor is shown under Term Weights (see Troubleshooting Ranking).

Phase 2: Document weighting

The second phase attributes a score to documents based on their freshness (last modification date) and quality. This phase, which is performed on the first 50,000 documents with the highest ranking scores returned by phase one, uses six ranking factors that cover areas such as the document language (same language as the user query or not) and source rating (reputation from lowest to highest) to further adjust the relevance score of these 50,000 documents. Once the ranking is done, the 100 highest scored documents are kept and the next three index ranking phases are performed on these documents.

Notes:

  • A CES administrator can fine-tune the importance of each of the factors, but this should be done with care because it affects all results in all search interfaces (see Customizing Search Results Ranking).

  • A CES administrator can modify the rating of a source, and thus the rating of the documents it contains (see Modifying General Source Parameters).

  • This phase involves loading document-specific information such as if the documents were modified recently.

  • For each document, the score attributed for each factor is shown under Document Weights (see Troubleshooting Ranking).

Phase 3: Collaborate ranking

When collaborative rating is enabled, the third ranking phase attributes a score to documents by considering the average of the personal appreciations given to each of those documents by members of the same group of the user performing the query [if any] (see What Is Collaborative Rating?). This phase is performed on the 100 documents returned by phase two. Once the ranking is done, the documents are reordered from the highest scored at the top to the lowest scored at the end.

Notes:

  • A CES administrator can enable collaborative rating (see Configuring Collaborative Rating).

  • Personal appreciation prevails on collaborative rating meaning that, once a user rates a search result, the collaborative rating score of this document is no longer taken into account (see Rating a Search Result) .

  • For each document, the score attributed for Collaborative rating is shown under Document Weights (see Troubleshooting Ranking).

Phase 4: Term frequency–inverse document frequency (TF-IDF)

The purpose of the fourth phase is to weight queried terms while taking their number of occurrences in documents into account.

The ranking engine evaluates the importance of a query term for a document based on the number of occurrences of this term in the document, but also inversely on the number of occurrences of the term in the index (TF-IDF). The more frequent a term is in the index, the less informative the term becomes since the significance and meaning are to a certain extent diluted.

Example: A common term such as product is worth less than a rare one such as iPhone.

Based on this methodology, each of the 100 documents returned from phase three receives an additional score, and then their ranks are adjusted accordingly.

Notes:

Phase 5: Adjacency ranking

The last phase computes the proximity of query terms, giving more weight to documents having the terms close together in the text. This step fine-tunes the order of the documents received from phase 4 and, once the reordering is done, documents are returned in the search interface to the user as a response to the submitted query.

Notes:

  • Term proximity does not apply to queries with one term and is only calculated on a maximum of 100 documents. Contact Coveo Support to receive assistance on how to modify this value that can be 400, 300, 200, 150, or 100. If bigger than 400, the number is reduced to the one set in the Optimization box (see Modifying or Using Advanced Index Parameters).

  • For each document, when ranking information is enabled, the score attributed for Adjacency is shown under Document Weights (see Troubleshooting Ranking).

  • The value of the docID is used to break ties (if any) and ensure the same results order is respected if the same query is performed in the future. Documents with the same ranking score are sorted in descending docID values order.

  • By default, ten results are shown per page in your search interface, meaning that past the tenth page, results were not processed by the last three phases.

This is how ranking is involved within relevancy. However, the ranking process is not limited to these phases. CES comes with many features that further help fulfill your needs. Features that you can use to personalize or customize the way you want your documents to be ranked. Top results, personal appreciation and ranking functions are among other factors influencing the relevance or search results.

People who viewed this topic also viewed