Configuring and Indexing a Google Sites Source
Note: Create separate Google Sites sources when:
-
You have more than one Google apps account to manage your Google Sites domains.
-
One source Google Sites associated with a private Google account.
To configure and index a Google Sites source
-
On the Coveo server, access the Administration Tool (see Opening the Administration Tool).
-
Select Index > Sources and Collections.
-
In the Collections section:
-
Select an existing collection in which you want to add the new source.
OR
-
Click Add to create a new collection (see Adding a Collection).
-
-
In the Sources section, click Add.
The Add Source page that appears is organized in three sections.
-
In the General Settings section of the Add Source page:
-
Enter the appropriate value for the following required parameters:
-
Name
-
Enter a descriptive name of your choice for the connector source.
Example: Google Sites
-
Source Type
-
Select the connector used by this source. In this case, select Google Sites.
Note: If you do not see Google Sites, your environment does not meet the requirements (see Google Sites Connector Requirements).
-
Enter the address of one or more specific Google Sites in one of the following formats:
-
For a private account: https://sites.google.com/site/<my_site>
-
For a domain account: https://sites.google.com/a/<my_domain>/<my_site>
OR
Enter a starting address to use auto-discovery to crawl all sites accessible to the connector using one of the following formats:
-
For a private account: https://sites.google.com/site
-
For a domain account: https://sites.google.com/a/<my_domain>
Notes:
-
The Google Sites returned by the auto-discovery feature are the ones to which the connector was granted access to when the OAuth2 refresh token was generated.
-
Only auto-discovery of all accessible sites within a single domain is supported, not for all domains of a specific Google Apps account.
-
The auto-discovery only returns sites with sharing permissions explicitly allowing the crawling user; it does not return sites allowing everyone from the domain nor does it consider administrator permissions which grants access to all web sites of the Google App domain.
-
-
Fields
-
Select the field set that you created earlier (see Google Sites Connector Deployment Overview and What Are Field Sets?).
-
Refresh Schedule
-
Time interval at which the index is automatically refreshed to keep the index content up-to-date. By default, the Every day option instructs CES to refresh the source everyday at 12 AM. Because the incremental refresh takes care of maintaining the source up-to-date, you can select a longer interval such as Every Sunday (see What Should Be the Frequency of Source Refresh Schedules?).
-
-
Review the value for the following parameters that often do not need to be modified:
-
Rating
-
Change this value only when you want to globally change the rating associated with all items in this source relative to the rating to other sources (see Understanding Search Results Ranking).
Example: When a source replaces a legacy system, you may want to set this parameter to High, so that in the search interface, results from this source appear earlier in the list compared to those from legacy system sources.
-
Document Types
-
If you defined a custom document type set for this source, select it (see What Are Document Type Sets?).
-
Active Languages
-
If you defined custom active language sets, ensure to select the most appropriate for this source (see Adding and Configuring a Language Set).
-
-
-
In the Specific Connector Parameters & Options section of the Add Source page:
-
In the Mapping File box, the path to the default mapping file that defines how the connector handles metadata often does not need to be changed.
Notes:
-
CES 7.0.7256– (December 2014) Enter the path to the default mapping file that defines how the connector handles metadata. You can leave this box empty, in which case no Google Sites metadata will be indexed.
Example: D:\Program Files\Coveo Enterprise Search 7\Bin\Coveo.CES.CustomCrawlers.GoogleSites.MappingFile.xml
-
CES 7.0.7104– (October 2014) If you create a custom mapping file, enter the path where you saved your file on the Coveo server (see Creating a Custom Google Sites Connector Mapping File). You can leave this box empty, in which case no Google Sites metadata will be indexed.
Example: D:\CES7\Config\MyGoogleSitesMappingFile.xml
-
-
Using the following parameters, authorize the Coveo crawler to access the Google Sites:
-
Client's id
-
Enter the Client ID value that you got earlier (see Getting Google Sites Client ID and Client Secret values).
-
Client's secret
-
Enter the Client Secret value that you got earlier (see Getting Google Sites Client ID and Client Secret values).
-
Client's refresh token
-
Enter the OAuth2 refresh token value that you got earlier (see Getting a Google Sites OAuth2 Refresh Token).
-
-
Click Add Parameter when you want to show and change the value of advanced source parameters (see Modifying Hidden Google Sites Source Parameters).
-
The Option check boxes generally do not need to be changed:
-
Index Subfolders
-
This parameter is not taken into account for this connector.
-
Index the document's metadata
-
When selected, CES indexes all the document metadata, even metadata that are not associated with a field. The orphan metadata are added to the body of the document so that they can be searched using free text queries.
When cleared (default), only the values of system and custom fields that have the Free Text Queries attribute selected will be searchable without using a field query (see Adding a Field to Search On and What Are Field Queries and Free Text Queries?).
Example: A document has two metadata:
-
LastEditedBy containing the value Hector Smith
-
Department containing the value RH
In CES, the custom field CorpDepartment is bound to the metadata Department and its Free Text Queries attribute is selected.
When the Index the document's metadata option is cleared, searching for RH returns the document because a field is indexing this value. Searching for hector does not return the document because no field is indexing this value.
When the Index the document's metadata option is selected, searching for hector also returns the document because CES indexed orphan metadata.
-
-
Document's addresses are case-sensitive
-
Leave the check box cleared. This parameter needs to be checked only in rare cases for systems in which distinct documents may have the same name but different casing.
-
Generate a cached HTML version of indexed documents
-
When you select this check box (recommended), at indexing time, CES creates HTML versions of indexed documents. In the search interfaces, users can then more rapidly review the content by clicking the Quick View link rather than opening the original document with the original application. Consider clearing this check box only when you do not want to use Quick View links or to save resources when building the source.
-
Open results with cached version
-
Leave this check box cleared (recommended) so that in the search interfaces, the main search result link opens the original document with the original application. Consider selecting this check box only when you do not want users to be able to open the original document but only see the HTML version of the document as a Quick View. In this case, you must also select Generate a cached HTML version of indexed documents.
-
-
-
In the Security section of the Add Source page:
-
When you chose to index Google Sites permissions, in the Security Provider drop-down list, select the Google Sites security provider that you created for this source (see Configuring a Google Sites Security Provider).
-
In the User Identity drop-down list, select the user identity that you created for this source (see Google Sites Connector Deployment Overview).
-
-
Click Save to save the source configuration.
-
When you chose to not index Google Sites permissions, you can set source level permissions that apply to all documents in the source:
-
In the navigation panel on the left, click Permissions.
-
In the Permissions page, select Specify the security permissions to index.
-
In the Allowed Users and Denied Users boxes, enter the users and groups that you respectively want to allow or deny to see search results from this source. The default is to allow everyone (Active Directory Group).
-
Click Apply Changes.
-
-
When you are ready to start indexing the Google Sites source, click Rebuild.
-
Validate that the source building process is executed without errors:
-
In the navigation panel on the left, click Status, and then validate that the indexing proceeds without errors.
OR
-
Open the CES Console to monitor the source building activities (see Using the CES Console).
-
What's Next?
Set an incremental refresh schedule for your source (see Scheduling a Source Incremental Refresh).
Consider modifying some hidden source parameters to try resolving issues (see Modifying Hidden Google Sites Source Parameters).