Product DocsMenu

Coveo Platform 7.0 >
Administrator Help > Connectors > Sitecore Legacy Connector > Configuring and Indexing a Sitecore Source for the Legacy Connector

Configuring and Indexing a Sitecore Source for the Legacy Connector

Deprecated

A Sitecore source only targets one Sitecore website. It is recommended to configure one source for each Sitecore website to index.

To configure and index a Sitecore source

  1. On the Coveo server, access the Administration Tool (see Opening the Administration Tool).

  2. Select Index > Sources and Collections.

  3. In the Collections section:

    1. Select an existing collection in which you want to add a new source.

      OR

    2. Click Add to create a new collection.

  4. In the Sources section, click Add.

  5. In the General Settings section of the Add Source page:

    1. Enter the appropriate value for the following required parameters:

      Name

      Enter a descriptive name of your choice for the connector source.

      Example: Sitecore Website (English)

      Source Type

      CES 7.0.7814+ (August 2015) The connector used by this source. In this case, select Sitecore (deprecated).

      Notes:

      • If you do not see Sitecore (deprecated) in the Source Type list, your environment does not meet the requirements (see Sitecore Legacy Connector Requirements).

      • CES 7.0.5935+ (September 2013) Select Sitecore when you want to use the second generation Sitecore connector (see Sitecore Connector).

      • CES 7.0.7711– (June 2015) The Sitecore Legacy connector appears in the list as Sitecore (Legacy).

      • CES 7.0.5785– (August 2013) The Sitecore Legacy connector appeared in the list as Sitecore while the second generation Sitecore connector (see Sitecore Connector) appeared as Sitecore2.

      Addresses

      The base address of the Sitecore installation. Enter one address in the following form:

      http://SitecoreWebsite

      The connector supports both http and https.

      Important: While the value in the Addresses box points to your Sitecore server, the Target Site box by default specifies to index the website site hosted in this server. In Sitecore, website is the default name for a site. When the site you want to index has a different name, like when your server hosts more than one site, you must specify the site name in the Target Site box.

      You can also use the Content Start Path box to restrict indexing to one or more branch of the content tree.

      Tip: Once you indexed your Sitecore content, If you obtain clickable URIs containing http twice such as http://http/www.MyServer.com, in the site definition of your Sitecore web.config file, ensure hostName does not contain http://. If you want to explicitly specify the protocol, use the scheme parameter (ex: <site name="WWWPortal" hostName="www.mysite.com" scheme="http" rootPath="/sitecore/content/Home" startItem="/Portal" contentStartItem="/Portal" />).

    2. Review the value for the following parameters that often do not need to be modified:

      Rating

      Change this value only when you want to globally change the rating associated with all items in this source relative to the rating to other sources (see Understanding Search Results Ranking).

      Example: If this source was for a legacy website, you may want to set this parameter to Low, so that in the search interface, results from this source appear later in the list compared to those from other sources.

      Document Types

      If you defined custom document type sets, ensure to select the most appropriate for this source (see What Are Document Type Sets?).

      Active Languages

      If you defined custom active language sets, ensure to select the most appropriate for this source (see Adding and Configuring a Language Set).

      Fields

      If you defined custom field sets, ensure to select the most appropriate for this source (see What Are Field Sets?).

      Refresh Schedule

      Time interval at which the index is automatically refreshed to keep the index content up-to-date. By default, the Every day option instructs CES to refresh the source every day at 12 AM.

      Note: You can create new or modify existing source refresh schedules (see Creating or Modifying a Source Schedule).

  6. In the Specific Connector Parameters & Options section of the Add Source page, review if you need to change the parameter default values:

    1. Enter the appropriate value for the following optional parameters:

      Content Start Path

      The starting point of indexing in the Sitecore content tree. When left blank, the default value corresponds to the default root path of the Target Site. You can specify one or more starting path by separating multiple root nodes with a semicolon (;).

      Example: /sitecore/content/home/MyNewRootNode;/sitecore/content/Resources

      Tip: You can determine the default root path in the Sitecore web.config file, by concatenating the rootPath and startItem attributes for a target site.

      Content Admin

      The Windows user that can see all indexed documents. By default, it is impossible to see the indexed content of a document when a source is using a security provider to index the permissions. Use this parameter when the source uses a security provider and you need to see the indexed document content in the Index Browser.

      Enter the user name in the following form: DomainName\UserName

      Languages

      Indicates the indexed languages. You can specify the languages to crawl by entering one or more language codes separated by a semicolon (;). Enter the * wildcard character to index all languages. A document is indexed for each language. By default, when the box is empty, a single document is indexed using the default language of the Target Site. If there is no language set on the site, English is used.

      Example: en;fr-CA

      Note: Items of the media library are always indexed with the site default language.

      Mapping File

      Enter the path of a valid XML mapping file that defines how the connector handles metadata.

      Configuring a complete mapping file is a key element to leverage the Sitecore metadata to produce a feature-rich search interface (see Defining a Sitecore Mapping File for the Legacy Connector).

    2. When your Sitecore website contains secured sections, use the following forms authentication parameters to allow the connector to authenticate itself and gain access to secured pages: 

      Note: When using form authentication, the Body Format source option must be set to Web Page and Generate a cached HTML version of indexed documents must be selected.

      Login Page

      URL of the page where users log on for forms authentication.

      Username Control

      ID of the control where users enter their username for forms authentication.

      Tip: You can get the ID by inspecting the corresponding input HTML tag from the source of the web page using your browser inspection features.

      Some websites use dynamic content (AJAX) in which case the page source might not be enough to retrieve the control ID. You can then use an external web debugger such as Fiddler to find what are the values passed to the server when the login command is invoked.

      Example: On Internet Explorer, select View > Source, locate the corresponding input tag, and extract the id (

      ctl00_ctlContentPlaceHolder_ctl00_ctlLogonControl_ctlPanelBar_txtUserName in the sample code below) 

      <input name="ctl00$ctlContentPlaceHolder$ctl00$ctlLogonControl$ctlPanelBar$txtUserName" type="text" id="ctl00_ctlContentPlaceHolder_ctl00_ctlLogonControl_ctlPanelBar_txtUserName" class="FormInputText" Focus="True" style="width:" />  

      Password Control

      ID of the control where users to enter their password for forms authentication.

      Login Command

      Login command sent by the forms authentication page.

    3. Revise the default value of the following parameters:

      Number of Refresh Threads

      Determines the number of simultaneous downloads handled by the connector for this source. The default value is 2.

      Index if no Layout

      By default, items with no layout cannot be directly found from a web browser and are therefore not indexed. Select the check box to index items that have no defined layout. This is useful to index the content of the blog post module.

      Tip: For blog posts items, you can change the clickable URL using a mapping file (see Defining a Sitecore Mapping File for the Legacy Connector).
      Example: When using the Sitecore blog module in the Printers sample site, the following mapping file can index blog posts when the Index No Layout option is selected on the source.
      <?xml version="1.0" encoding="utf-8" ?>
      <Sitecore>
        <CommonMappings>
          <Fields>
            <Title>%[_CESSCDisplayName]</Title>
          </Fields>
        </CommonMappings>
        <Mapping template="{5CF2ED9B-6C32-4FA3-9549-2AB77085B131}"> <!--UserBlog-->
          <Fields>
            <ClickableUri> %[_CESSCServerBaseUrl]/Company/Blogs.aspx?blog=%[Blog Title]</ClickableUri>
            <PrintableUri> %[_CESSCServerBaseUrl]/Company/Blogs.aspx?blog=%[Blog Title]</PrintableUri>
          </Fields>
        </Mapping>
        <Mapping template="{1FBDD65D-5029-46F1-8D75-AF3E68810B25}"> <!--Article-->
          <Fields>
            <ClickableUri>%[_CESSCServerBaseUrl]/Company/Blogs.aspx?post=%[Title]&amp;blog=%[_CESSCParentID.Blog Title]</ClickableUri>
            <PrintableUri>%[_CESSCServerBaseUrl]/Company/Blogs.aspx?post=%[Title]&amp;blog=%[_CESSCParentID.Blog Title]</PrintableUri>
          </Fields>
        </Mapping>
        <Mapping template="{FB71F255-31D5-417A-BD5C-12D458EB8FDB}"> <!--Comment-->
          <Fields>
            <ClickableUri>%[_CESSCServerBaseUrl]/Company/Blogs.aspx?post=%[_CESSCParentID.Title]&amp;blog=%[_CESSCParentID._CESSCParentID.Blog Title]</ClickableUri>
            <PrintableUri>%[_CESSCServerBaseUrl]/Company/Blogs.aspx?post=%[_CESSCParentID.Title]&amp;blog=%[_CESSCParentID._CESSCParentID.Blog Title]</PrintableUri>
          </Fields>
        </Mapping>
      </Sitecore>  

      Include Media Library

      By default, this check box is selected to index all the content of the media library. This has the same effect as adding the /Sitecore/content/media library to the Content Start Path value.

      Note: When media items are referenced from content items that are indexed, these media items are also indexed even when this check box is cleared.

      Database

      The name of the Sitecore database to index. You can also enter master to index the non-published content of the target site. When left blank, the default value corresponds to the database defined for the Target Site.

      Note: When you specify a value other than the default and use a security provider, you must set the security provider database parameter to the same value (see Configuring a Sitecore Security Provider for the Legacy Connector).

      Target Audience

      Indicates to the connector what the targeted audience of the source is. This option affects how the items are opened when a user clicks a search result:

      • Web: Opens the results as a standard Web page. Default value.

      • Content Editors: Opens the results for edition in the Sitecore Content Editor.

      Body Format

      Specifies how the HTML cached version of an indexed document is saved.

      • Web Page: Sends the HTML version of an item as rendered by Sitecore. This is the default value that produces a nice Quick View.

        It is however important to set the body field in the mapping file. Otherwise, the navigation and other peripheral elements of the pages are indexed and become searchable (see Defining a Sitecore Mapping File for the Legacy Connector).

      • Metadata: Only sends Sitecore metadata and values. The Quick View presents an unformatted list of all Sitecore metadata and values.

        This option is useful for an administrator to review all the metadata gathered by the connector and help configuring the mapping file. You can use this option in conjunction with Target Audience set to Content Editors.

      Target Site

      Specifies the targeted Sitecore site to index. The default value is website. When the Sitecore website does not use the default name (website), you must use this parameter and provide the appropriate name. You can get the name of the site from the Sitecore web.config file.

      Example: The following excerpt shows a Sitecore web.config file defining five websites. All the websites hosted in a single Sitecore installation are defined under the <site> node and the string to enter in the TargetSite parameter is the one of the name attribute.

      <sites>
       ...
       <site name="danish" hostName="da.printers" language="da-DK" virtualFolder="/" 
       <site name="german" hostName="de.printers" language="de-DE" virtualFolder="/" 
       <site name="english" hostName="en.printers" language="en" virtualFolder="/" 
       <site name="british" hostName="gb.printers" language="en-GB" virtualFolder="/" 
       <site name="website" virtualFolder="/" physicalFolder="/" 
        ...
      </sites>
    4. Click Add Parameter when you want to show advanced source parameters (see Modifying Hidden Sitecore Source Parameters for the Legacy Connector).

    5. The Option check boxes generally do not need to be changed.

      Index Subfolders

      Keep this check box selected (recommended). By doing so, all subfolders from the specified starting address are indexed.

      Index the document's metadata

      When selected, CES indexes all the document metadata, even metadata that are not associated with a field. The orphan metadata are added to the body of the document so that they can be searched using free text queries.

      When cleared (default), only the values of system and custom fields that have the Free Text Queries attribute selected will be searchable without using a field query (see Adding a Field to Search On and What Are Field Queries and Free Text Queries?).

      Example: A document has two metadata:

      • LastEditedBy containing the value Hector Smith

      • Department containing the value RH

      In CES, the custom field CorpDepartment is bound to the metadata Department and its Free Text Queries attribute is selected.

      When the Index the document's metadata option is cleared, searching for RH returns the document because a field is indexing this value. Searching for hector does not return the document because no field is indexing this value.

      When the Index the document's metadata option is selected, searching for hector also returns the document because CES indexed orphan metadata.

      Document's addresses are case-sensitive

      Leave this check box cleared. This parameter needs to be checked only in rare cases for systems in which distinct documents may have the same name but different casing.

      Generate a cached HTML version of indexed documents

      When you select this check box (recommended), at indexing time, CES creates HTML versions of indexed documents. In the search interfaces, users can then more rapidly review the content by clicking the Quick View link rather than opening the original document with the original application. Consider clearing this check box only if you do not want to use Quick View links or save resources when building the source.

      Open results with cached version

      Leave this check box cleared (recommended) so that in the search interfaces, the main search result link opens the original document with the original application. Consider selecting this check box only when you do not want users to be able to open the original document but only see the HTML version of the document as a Quick View. In this case, you must also select Generate a cached HTML version of indexed documents.

    6. In the Authentication drop-down list, when you created a Sitecore user identity for this source, select it.

  7. In the Security section of the Add Source page:

    1. In the Active Directory Security Provider drop-down list, select Active Directory or a custom Active Directory security provider that you created for a specific domain (see Configuring an Active Directory Security Provider).

    2. In the Sitecore Security Provider drop-down list, select the security provider that you created for this source (see Configuring a Sitecore Security Provider for the Legacy Connector).

    3. In the Authentication drop-down list, select the user identity that you created for this Sitecore source.

  8. Click Save to save the source configuration.

  9. Before indexing the source, consider the following optional steps:

    1. Consider showing and modifying advanced source parameters (see Modifying Hidden Sitecore Source Parameters for the Legacy Connector).

    2. Consider using a custom mapping file (see Defining a Sitecore Mapping File for the Legacy Connector).

  10. On the button bar, click Rebuild to start indexing the source.

  11. Validate that the source building process is executed without errors:

    • In the navigation panel on the left, click Status, and then validate that the indexing proceeds without errors.

      OR

    • Open the CES Console to monitor the source building activities (see Using the CES Console).

What's Next?

When incremental refresh is enabled (see Enabling Incremental Refresh on a Sitecore Database for the Legacy Connector), set an incremental refresh schedule for your source (see Scheduling a Source Incremental Refresh).

Optionally integrate the Coveo search interface in your Sitecore website (see Integrating the Coveo .NET Search Interface in a Sitecore Website).

People who viewed this topic also viewed