Product DocsMenu

Coveo Platform 7.0 >
Administrator Help > Connectors > RSS Connector > Configuring and Indexing a Source

Configuring and Indexing an RSS Source

A source defines a set of configuration parameters for one or more RSS feeds.

To configure and index a source with the RSS connector

  1. Ensure that your environment meets the RSS source requirements:

  2. On the Coveo server, access the Administration Tool (see Opening the Administration Tool).

  3. CES 7.0.8225+ (March 2016) Create an RSS field set to take advantage of the available RSS metadata.

    1. Import the default RSS field set file ([CES_Path]\Bin\Coveo.CES.CustomCrawlers.RSSCrawler.FieldSet.xml) to create fields for all the metadata available by default from RSS documents (see Exporting and Importing a Field Set).

    2. When you created custom metadata for your RSS documents, add corresponding fields to the field set (see Adding or Modifying Custom Fields).

  4. Select Index > Sources and Collections.

  5. In the Collections section:

    1. Select an existing collection in which you want to add the new source.

      OR

    2. Click Add to create a new collection (see Adding a Collection).

  6. In the Sources section, click Add.

  7. In the General Settings section of the Add Source page:

    1. Enter the appropriate value for the following required parameters:

      Name

      Enter a descriptive name of your choice for the connector source.

      Example: CNN Technology RSS Feed

      Source Type

      The connector used by this source. In this case, select RSS.

      Addresses

      Enter the URL of the RSS feed to index by simply copying and pasting the corresponding RSS feed link in either the file:/// or http:// form.

      Examples: To index the Stack Overflow feed, the URL is:

      http://stackoverflow.com/feeds

      You can enter more than one RSS feed address on separate lines, but you must ensure that all source parameters apply to all RSS feeds. Otherwise, create other sources for other feeds.

      Refresh Schedule

      Unless your RSS feeds supports OpenSearch (see OpenSearch extension), select (none) when you want to keep in your source previously indexed old RSS items that are no longer available from the RSS feed. A full refresh, like a rebuild, deletes from the source the old items that are no longer available from the feed.

      Select an interval like every day when you want to only make the latest feeds searchable.

      Notes: Configure an incremental refresh schedule on your source to continuously maintain the source up-to-date (see Scheduling a Source Incremental Refresh.

    2. Review the value for the following parameters that often do not need to be modified:

      Rating

      Change this value only when you want to globally change the rating associated with all items in this source relative to the rating to other sources (see Understanding Search Results Ranking).

      Example: If this source was for a important RSS feed, you may want to set this parameter to High, so that in the search interface, results from this source appear earlier in the search result list compared to those from other sources.

      Document Types

      If you defined custom document type sets, ensure to select the most appropriate for this source (see What Are Document Type Sets?).

      Active Languages

      If you defined custom language sets, ensure to select the most appropriate for this source (see Adding and Configuring a Language Set).

      Fields

      CES 7.0.8225+ (March 2016) Select the field set that you created earlier (see RSS field set).

      If you created a custom RSS field set for this source, select it. Otherwise, leave the Default Scheme (see What Are Field Sets?).

      Note: CES 7.0.8047– (December 2015) If you created a custom RSS field set for this source, select it. Otherwise, leave the Default Scheme (see What Are Field Sets?).

  8. In the Specific Connector Parameters & Options section of the Add Source page:

    1. Review if you need to change the default values for the following parameters:

      Number of Refresh Threads

      Determines the number of simultaneous downloads handled by the connector for this source. The default value is 2.

      Mapping File

      The path to the mapping file. Leave the default value to use the default mapping file that comes with the connector (Coveo.CES.CustomCrawlers.RSSCrawler.MappingFile.xml). If you create a custom mapping file, enter the full path to your custom mapping file. Contact Coveo Support for assistance if you need to customize the mapping file.

      Index RSS feed URL CES 7.0.9167+ (December 2017)

      Whether to index the URL of the RSS feed. By default, the RSS feed URL is not indexed.

    2. In the Option section, review the default value of the following check boxes:

      Index Subfolders

      Check to index all subfolders below the specified RSS server address. Selected by default.

      Index the document's metadata

      When selected, CES indexes all the document metadata, even metadata that are not associated with a field. The orphan metadata are added to the body of the document so that they can be searched using free text queries.

      When cleared (default), only the values of system and custom fields that have the Free Text Queries attribute selected will be searchable without using a field query (see Adding a Field to Search On and What Are Field Queries and Free Text Queries?).

      Example: A document has two metadata:

      • LastEditedBy containing the value Hector Smith

      • Department containing the value RH

      In CES, the custom field CorpDepartment is bound to the metadata Department and its Free Text Queries attribute is selected.

      When the Index the document's metadata option is cleared, searching for RH returns the document because a field is indexing this value. Searching for hector does not return the document because no field is indexing this value.

      When the Index the document's metadata option is selected, searching for hector also returns the document because CES indexed orphan metadata.

      Generate a cached HTML version of indexed documents

      When you select this check box (recommended), at indexing time, CES creates HTML versions of indexed documents. In the search interfaces, users can then more rapidly review the content by clicking the Quick View link rather than opening the original document with the original application. Consider clearing this check box only if you do not want to use Quick View links or to save resources when building the source.

      Open results with cached version

      Leave this check box cleared (recommended) so that in the search interfaces, the main search result link opens the original document with the original application. Consider selecting this check box only when you do not want users to be able to open the original document but only see the HTML version of the document as a Quick View. In this case, you must also select Generate a cached HTML version of indexed documents.

    3. Click Save to save the source configuration.

  9. Because RSS feeds are not secured, the RSS connector does not index permissions and you must change the default Permissions option to set the permissions globally on the source:

    Note: You get the following error message in the CES Console when the Index security permissions option is selected by default:

    Permissions indexing is not provided by the RSS crawler. You must manually configure the permissions for the source '[Source_Name]'.

    1. In the navigation panel on the left, select Permissions.

    2. In the Permissions page:

      1. Select the Specifies the security permissions to index option.

      2. Optionally, in the Allowed Users list, add or remove users or groups to precisely specify who has access to the content from this source.

        By default, the Active Directory everyone group specifies that any Active Directory user can see all the content from this source.

      3. Optionally, in the Denied Users list, add users or groups to specify who has not access to the content from this source.

      4. Click Apply Changes.

  10. On the toolbar, click Start/Rebuild to start indexing your source.

  11. Validate that the source building process is executed without errors:

    • In the navigation panel on the left, click Status, and then validate that the indexing proceeds without errors.

      OR

    • Open the CES Console to monitor the source building activities (see Using the CES Console).

What's Next?

Set an incremental refresh schedule for your source to maintain your source up-to-date with the RSS feed (see Scheduling a Source Incremental Refresh).

People who viewed this topic also viewed