Product DocsMenu

Modifying Advanced Source Parameters

The advanced parameters determine more precisely how a source is crawled, converted and indexed. The available advanced parameters vary according to the type of source.

To modify the advanced source parameters

  1. On the Coveo server, access the Administration Tool (see Opening the Administration Tool).

  2. In the Administration Tool, select Index > Sources and Collections.

  3. In the Sources and Collections page: 

    1. In the Collections section, select the collection the source that you want to modify.

    2. In the Sources section, select the source that you want to modify.

    3. In the navigation panel on the left, select Advanced.

  4. In the Advanced page, refer to the following table to configure the available advanced source parameters, and then click Apply Changes.

    Note: The available parameters vary depending on the source type.

    Section Applies to Description

    Crawling

    All sources

    Note: The crawling options differ from source to source.

    Determines which elements of a source are indexed (see What Is the Structure of SharePoint?).

    Index hidden lists and their items: Indexes SharePoint hidden lists. Note that this option, which applies to SharePoint Legacy sources, is not selected by default.

    Index redundant issue list items: Indexes duplicate SharePoint Issues items. Note that this option, which applies to SharePoint Legacy sources, is not selected by default.

    Index survey responses: Indexes the responses to SharePoint Surveys. Note that this option, which applies to SharePoint Legacy sources, is not selected by default.

    Index documents uploaded with WebDAV: Indexes elements uploaded in SharePoint using WebDAV (i.e. elements invisible in the interface). Note that this option, which applies to SharePoint Legacy sources, is selected by default.

    Restrict crawling to X levels: Limits the crawling depth. This option applies to all sources and is not selected by default.

    Example: Restrict crawling to 2 levels indexes only the main address (\\CoveoServer\Help\AdminTool\) and addresses directly related to it (\\CoveoServer\Help\AdminTool\Sources\), other subfolders or subsites (\\CoveoServer\Help\AdminTool\Sources\Local\) are not indexed

    Allow crawling of external links: Indexes external Web pages directly linked to a website (but not their subpages). Note that this option, which applies to Web Pages sources, is not selected by default.

    Example: If http://www.coveo.com contains a link toward http://microsoft.com this latter page is indexed; however, http://www.microsoft.com/careers is not.

        Disable cookies: Rejects cookies. This option is used when cookies keep the Coveo Platform from crawling a site (ex.: they redirect the connector elsewhere). Note that this option, which applies to Web Pages and SharePoint Legacy sources, is not selected by default.

    Expand sites and lists before applying filters: Builds the tree of SharePoint sites and lists before applying filters. This option allows the indexing of non-excluded children of excluded parent items. Note that this option, which applies to SharePoint Legacy sources, is not selected by default.

    Use the author extracted from the document instead of the SharePoint author: Extracts the authors of SharePoint documents using a conversion script. If this option is not selected, the metadata author name is used instead. Note that this option, which applies to SharePoint Legacy sources, is not selected by default.

    Use UTF-8 addresses: Indicates that the addresses of the documents are in UTF-8 instead of ANSI format. Note that this option, which applies to Web Pages sources, is not selected by default (see What Is the Difference Between ANSI and UTF-8 URI Formats?).

    Preserve file last access date: Preserves the last access date of the file—after indexing the file, the date is set to the last access date. When a user executes backups, it is important for the date of the file not to change if the file has not been accessed.

    Support robot exclusions: Indicates which robot exclusion rules (i.e. commands forbidding the crawling of a site) are respected by the Web connector. By default, all such rules are respected. However, it is possible to respect only Robots.txt or HTML META Tags. Moreover, it is possible to disregard all exclusion rules. Note that this option applies to Web Pages sources.

    Download

    Web Pages

    Timeout: Determines the number of seconds after which the Web connector disconnects from a source which is not responding. Note that the value entered in this field must be at least 1.

    Delay Between Downloads: Determines the number of seconds elapsed between each download made by the Web connector (pausing between downloads allows the site not to be queried continuously. A 10 second delay is the standard for Web connectors on the Web). Note that the value entered in this field must be comprised between 0 and 60.

    Server Name Alias

    Local/Network Files

    Web Pages

    SharePoint Legacy

    Indicates the name of the server used to retrieve documents during queries - if it is different from the server crawled during indexing.

    Example: It is possible to index documents on a staging computer but open them from a production one. In this case, the name of the production server must be entered in the Server Name Alias box.

    Tip: The second generation SharePoint connector does not have this option, but uses a mapping file which allows you to override the Clickable and Printable URIs (see Creating and Using a Custom SharePoint Mapping File). You need to create a metadata containing only the path part (i.e. no scheme and no server) of the original document URI to be able to replace it with a server alias.

    Priority

    All sources

    Determine the order in which sources are indexed - sources with Highest priority are indexed first; whereas, sources with Lowest priority are indexed last. Note that if sufficient memory and CPU resources are available, all sources can be indexed simultaneously.

    Performance

    All sources

    Determines whether additional analysis of the document content is performed during the indexing process or not.

    Disable advanced text layout analysis for PDF documents: Disables the advanced analysis of PDF documents in order to save CPU resources and speed up indexing. The purpose of this advanced analysis is to improve ranking and summarization by determining the reading order of columns in PDF documents (ranking and summarization are affected by the order and proximity of words). Note that advanced analysis of PDF documents is enabled by default.

    Disable advanced duplicate document filtering: Disables the filtering of duplicate documents in order to save CPU resources and speed up indexing. The purpose of this advanced filtering is to display only one copy of each document in the result list. Note that the filtering of duplicate documents is enabled by default.

    Conversion Timeout

    All sources

    Determines the number of minutes after which the converter proceeds to another document even if the conversion is not complete (the document for which the conversion has not been finished is considered corrupted). By default, the conversion timeout is of 10 minutes.

People who viewed this topic also viewed