Product DocsMenu

Coveo Platform 7.0 >
Administrator Help > Connectors > File Connector > Configuring and Indexing a File Connector Source

Configuring and Indexing a File Connector Source

A source defines a set of configuration parameters for one or more file shares or file share sections.

Note: Create two or more sources when file shares or file share sections need different parameters sets. A source uses one or more starting addresses to determine locations to crawl and index.

To configure and index a File connector source

  1. On the Coveo server, access the Administration Tool (see Opening the Administration Tool).

  2. Select Index > Sources and Collections.

  3. In the Collections section:

    1. Select an existing collection in which you want to add the new source.

      OR

    2. Click Add to create a new collection (see Adding a Collection).

  4. In the Sources section, click Add.

    The Add Source page that appears is organized in three sections.

  5. In the General Settings section of the Add Source page:

    1. Enter the appropriate value for the following required parameters:

      Name

      Enter a descriptive name of your choice for the connector source.

      Example: Corporate network file share

      Source Type

      Select the connector used by this source. In this case, select Files.

      Addresses

      The list of starting address URIs indicating locations to index, one entry per line. You can specify the URIs as local or network paths. Addresses can represent a file system folder or file, a mail archive, or even a folder within a mail archive.

      Examples:
      Network folder: file://svr-fileshare/root
      Local folder: file:///c:/fileshare/root/
      Local file: file:///c:/fileshare/root/docs/work.doc
      Mail archive: file://svr-fileshare/mails/jsmith.pst
      Folder in a mail archive: file://svr-fileshare/mails/jsmith.pst/work
      IP address file://192.168.1.2/share

      Important: When you use paths containing drive letters as starting addresses (ex.: C:\fileshare), users will not be able to open the resulting links in the search result page. A better practice is therefore to rather index network file shares (ex.: \\Intranet\fileshare).

      Refresh Schedule

      Time interval at which the source is automatically refreshed to keep the index content up-to-date. The recommended Every day option instructs CES to refresh the source everyday at 12 AM.

      Note: You can create new or modify existing source refresh schedules (see Creating or Modifying a Source Schedule).

    2. Review the value for the following parameters that often do not need to be modified:

      Rating

      Change this value only when you want to globally change the ranking associated with all items in this source relative to the rating of other sources (see Understanding Search Results Ranking).

      Example: If this source was for a legacy system, you may want to set this parameter to Low, so that in the search interface, results from this source appear later in the list compared to those from other sources.

      Document Types

      If you created a custom document type set for this source, select it (see Creating a Document Type Set). Otherwise, select Default.

      Active Languages

      If you defined custom active language sets, ensure to select the most appropriate for this source (see Adding and Configuring a Language Set).

      Fields

      If you defined custom field sets, ensure to select the most appropriate for this source (see What Are Field Sets?).

  6. In the Specific Connector Parameters & Options section of the Add Source page:

    1. Enter the appropriate value for the following parameters when you optionally want to index the content of mail archive files:

      Mapping Archives Configuration File

      When you decide to use a mail archive mapping file, enter the absolute full path pointing to your mapping file (see Mail Archive Indexing with the File Connector and Creating a Mail Archive Mapping File).

      Example: C:\CES7\Config\Coveo.CES.CustomCrawlers.File.MailArchives.config

      Expand Mail Archives

      Select to index the content of mail archives (.pst). The default is false.

    2. The default values for the following parameters generally do not need to be changed:

      Number of Live Monitoring Threads

      Determines the number of file system changes that the connector live monitoring can process simultaneously. The default and recommended value is 1.

      Max Number of Retries

      Number of retries to perform when indexing fails for a file that is opened by another application. The default and recommended value is 2.

      Number of Refresh Threads

      Determines the number of files that the connector can refresh simultaneously. The default and recommended value is 2.

      Expand Before Filtering

      By default this option is not selected so that the crawler applies inclusion and exclusion filters on files but also on folders before crawling so that it only expands folders that you want to index. In rare cases where an inclusion or exclusion filter should only be applied to files (ex. *.tif), you need to select this option so that the crawler fully expands folders to see all files and effectively applies the filters.

      Note: Selecting this option can have a significant performance cost. The best practice is to use inclusion or exclusion filters to specify folders, not file types. Rather use document type sets to specify the file types to be indexed (see What Are Document Type Sets?).

      Index Share Permissions

      By default this option is cleared. Select this option to index both the share and NTFS permissions (see the Microsoft document Share and NTFS Permissions on a File Server).

      Parameters

      Click Add Parameter when you want to show advanced hidden source parameters (see Modifying Hidden File Connector Source Parameters).

    3. The Option check boxes generally do not need to be changed:

      Index Subfolders

      Check to index all subfolders below the specified starting addresses.

      Note: You can control more precisely specific folders or files to crawl using inclusion or exclusion filters (see Adding or Modifying Source Filters).

      Index the document's metadata

      When selected, CES indexes all the document metadata, even metadata that are not associated with a field. The orphan metadata are added to the body of the document so that they can be searched using free text queries.

      When cleared (default), only the values of system and custom fields that have the Free Text Queries attribute selected will be searchable without using a field query (see Adding a Field to Search On and What Are Field Queries and Free Text Queries?).

      Example: A document has two metadata:

      • LastEditedBy containing the value Hector Smith

      • Department containing the value RH

      In CES, the custom field CorpDepartment is bound to the metadata Department and its Free Text Queries attribute is selected.

      When the Index the document's metadata option is cleared, searching for RH returns the document because a field is indexing this value. Searching for hector does not return the document because no field is indexing this value.

      When the Index the document's metadata option is selected, searching for hector also returns the document because CES indexed orphan metadata.

      Document's addresses are case-sensitive

      Leave the check box cleared. This parameter needs to be checked only in rare cases for case sensitive systems in which distinct documents may have the same file name but with different casing.

      Generate a cached HTML version of indexed documents

      When you select this check box (recommended), at indexing time CES creates HTML versions of indexed documents and saves them in the unified index. In the search interfaces, users can then more rapidly review the content by clicking the Quick View link to open the HTML version of the item rather than opening the original document with the original application.

      When the source includes mail archives files, you must select this option to ensure users can view the content of mail archives items.

      Consider clearing this check box only if you do not want to use Quick View links or to save resources when building the source.

      Open results with cached version

      Leave this check box cleared (recommended) so that in the search interfaces, the main search result link opens the original document with the original application. Consider selecting this check box only when you do not want users to be able to open the original document but only see the HTML version of the document as a Quick View. When this option is selected, you must also select the Generate a cached HTML version of indexed documents check box.

      Note: When you index mail archive files, a custom document type set handles how mail archive items are opened from the search interfaces (see Setting up a Document Type for Mail Archive Indexing).

  7. In the Security section of the Add Source page:

    1. In the Security Provider drop-down list, select Active Directory or a custom Active Directory security provider that you created for a specific domain (see Configuring an Active Directory Security Provider).

    2. In the Authentication drop-down list, when you chose to use a specific account to crawl the file system (see Setting up a File System Crawling Account), select the user identity that you created for this account. Leave this parameter empty when you want the connector to crawl the file system using the CES service identity (see About the CES Service Logon Account).

    3. Click Save and Start to save the source configuration and start indexing this source.

  8. Validate that the source building process is executed without errors:

    • In the navigation panel on the left, click Status, and then validate that the indexing proceeds without errors.

      OR

    • Open the CES Console to monitor the source building activities (see Using the CES Console).

People who viewed this topic also viewed