Product DocsMenu

About the Index

The unified index is the heart of the Coveo Platform Back-End. The index contains references to the whole content of indexed documents from the crawled repositories.

Facts about the index:

  • The index is organized in source and collections (see Understanding Coveo .NET Components Hierarchy).

  • The index records the occurrences and positions of all term variants including those containing accented characters and all small common words.

  • The index records the presence in documents of terms with casing variants (first, all, or some letters in uppercase) or special formatting (bold, italic, underline,...) but does not record their positions.

  • The index detects and saves the encoding and the language of each indexed document for a large number of languages (see Supported Languages).

  • At query time, the index expands queried terms using language-specific stemming algorithms to return a more complete set of results (see About Stemming). You can configure which language to use to perform the expansion (see Configuring the Culture of a Search Hub With the .NET Interface Editor).

  • The index maintains a word correction lexicon that sorts indexed terms by their number of occurrences and is used by the query spelling suggestion algorithm to find more frequent spelling variants and propose a correction (see How Are Misspelled Words Handled?).

  • The index minimizes possible stemming errors by calculating a correlation factor between the searched term and every possible expansion. In search results, documents containing highly correlated expansions are ranked higher than ones containing poorly correlated expansions.

    Example: When you search for universe, because of the way the stemming algorithm works, the index expands your query using terms from the univer stem classes that can include university . When the terms universe and university rarely co-occur in your indexed documents, documents containing university are ranked lower.

    Note: Correlation computations are performed during off-peak hours for queried terms. You can however launch this calculation from the Administration Tool (see Modifying or Using Advanced Index Parameters).

  • The index can process wildcards within phrase searches (see Using Wildcards in Queries) and when using the NEAR operator (see Search Prefixes and Operators). You can enable/disable the possibility to use wildcard within queries (see Modifying or Using Advanced Index Parameters).

  • The index continuously and automatically cleans up references to deleted documents (see About the Index Self-Optimization Process).

People who viewed this topic also viewed