Configuring and Indexing an Atlassian Confluence V2 Source
To configure and index a Confluence V2 source
-
On the Coveo server, access the Administration Tool (see Opening the Administration Tool).
-
Select Index > Sources and Collections.
-
In the Collections section:
-
Select an existing collection in which you want to add the new source.
OR
-
Click Add to create a new collection (see Adding a Collection).
-
-
In the Sources section, click Add.
The Add Source page that appears is organized into three sections.
-
In the General Settings section of the Add Source page:
-
Enter the appropriate value for the following required parameters:
-
Name
-
A descriptive name of your choice for the connector source.
Example: Corporate Confluence Wiki
-
Source Type
-
The connector used by this source. In this case, select Confluence.
Note: If you do not see Confluence v2 in the Source Type list, ensure that your environment meets the requirements (see Atlassian Confluence V2 Connector Requirements).
-
Addresses
-
List of starting points for the connector, one address per line.
Examples: Depending on the Confluence environment and use cases, use one the following URL format:
-
To index a complete Confluence (on-premises) site, add the Confluence server root URL:
http://MyConfluenceServer:8090/
-
To index specific on-premises spaces, add their URL:
http://MyConfluenceServer:8090/display/space1
http://MyConfluenceServer:8090/display/space2
-
To index a complete Confluence Cloud site, add the Confluence server root URL:
https://MyConfluenceServer.atlassian.net/wiki/
-
To index specific Confluence Cloud spaces, add their URL:
https://MyConfluenceServer.atlassian.net/wiki/display/space1
https://MyConfluenceServer.atlassian.net/wiki/display/space2
where you replace MyConfluenceServer with your Confluence instance name, and space1 and space2 with the desired Confluence space keys.
Notes:
-
To be able to index document permissions, all your starting points must be located on a single Confluence site. Create separate sources for separate sites.
-
You can enter specific space addresses for deployments where Confluence is not installed at the server root, respecting the following format: http://server/MyConfluence/display/spacename.
-
-
-
The following parameters often do not need to be changed:
-
Rating
-
Change this value only when you want to globally change the rating associated with all items in this source relative to the rating of other sources (see Understanding Search Results Ranking).
Example: When the source indexes a legacy repository, you may want to set this parameter to Low, so that in the search interface, results from this source appear lower in the list compared to those from active repository sources.
-
Document Types
-
If you defined a custom document type set for this source, select it (see What Are Document Type Sets?).
-
Active Languages
-
If you defined custom active language sets, ensure to select the most appropriate for this source (see Adding and Configuring a Language Set).
-
Fields
-
Select the field set that you created earlier (see Atlassian Confluence V2 Connector Deployment Overview).
-
Time interval at which the index is automatically refreshed to keep the index content up-to-date. By default, the Every day option instructs CES to refresh the source everyday at 12 AM. Because the incremental refresh (supported when the plugin is installed) takes care of maintaining the source up-to-date, you can select a longer interval such as Every Sunday (see What Should Be the Frequency of Source Refresh Schedules?).
-
-
-
In the Specific Connector Parameters & Options section of the Add Source page:
-
Review if you need to change the default values for the following parameters:
-
Filter Space Regex
-
The regex to use to filter spaces when you want to index only a subset of Confluence.
Note: This parameter is useful when you have a large number of spaces to index that have an element in common in their space keys.
Example: You want to index all spaces with keys starting with an uppercase letter followed by a number, so you enter the following regex:
^[A-Z][0-9].*$
-
Number of Refresh Threads
-
Determines the number of refresh threads that allow the connector to crawl web pages in parallel. The default value is 2 threads.
Note: Increasing this value may improve source refresh speed but puts more load on the Confluence server.
-
The path to the mapping file that defines how the connector handles metadata. Leave the default value to use the default mapping file that comes with the connector (Coveo.CES.CustomCrawlers.Confluence2.MappingFile.xml). If you create a custom mapping file, enter the full path to your custom mapping file. Contact Coveo Support for assistance if you need to customize the mapping file.
-
Index Only Global Spaces
-
When selected, only global spaces are indexed, meaning that personal spaces are ignored.
-
Index Only Personal Spaces
-
When selected, only personal spaces are indexed, meaning that global spaces are ignored.
-
Index Comments
-
When selected, comments on blog posts and pages are indexed. Comments are indexed as metadata of the page, not as documents.
-
Index Attachments
-
When selected, binary files attached to a page, blog post or comment are indexed. Attachments are indexed with the same level and sets of their parent.
-
-
Review the Option check boxes generally do not need to be changed:
-
Index Subfolders
-
Keep this check box selected (recommended). By doing so, all subfolders from the specified server address are indexed.
-
Index the document's metadata
-
When selected, CES indexes all the document metadata, even metadata that are not associated with a field. The orphan metadata are added to the body of the document so that they can be searched using free text queries.
When cleared (default), only the values of system and custom fields that have the Free Text Queries attribute selected will be searchable without using a field query (see Adding a Field to Search On and What Are Field Queries and Free Text Queries?).
Example: A document has two metadata:
-
LastEditedBy containing the value Hector Smith
-
Department containing the value RH
In CES, the custom field CorpDepartment is bound to the metadata Department and its Free Text Queries attribute is selected.
When the Index the document's metadata option is cleared, searching for RH returns the document because a field is indexing this value. Searching for hector does not return the document because no field is indexing this value.
When the Index the document's metadata option is selected, searching for hector also returns the document because CES indexed orphan metadata.
-
-
Document's addresses are case-sensitive
-
Leave the check box cleared. This parameter needs to be checked only in rare cases for systems in which distinct documents may have the same name but different casing.
-
Generate a cached HTML version of indexed documents
-
When you select this check box (recommended), at indexing time, CES creates HTML versions of indexed documents. In the search interfaces, users can then more rapidly review the content by clicking the Quick View link rather than opening the original document with the original application. Consider clearing this check box only when you do not want to use Quick View links or to save resources when building the source.
-
Open results with cached version
-
Leave this check box cleared (recommended) so that in the search interfaces, the main search result link opens the original document with the original application. Consider selecting this check box only when you do not want users to be able to open the original document but only see the HTML version of the document as a Quick View. In this case, you must also select Generate a cached HTML version of indexed documents.
-
-
In the Parameters section, click Add Parameter to be able to change the default value of hidden parameters (see Modifying Hidden Atlassian Confluence V2 Source Parameters).
Note: When you implemented single sign-on Okta (CES 7.0.8691+ (December 2016)) or Atlassian Crowd SSO (CES 7.0.8850+ (March 2017) on your Confluence instance, you must add the UseRequestParametersAuth hidden parameter and set it to true both on the source and security provider configurations (see Configuring an Atlassian Confluence V2 Security Provider).
-
-
In the Security section of the Add Source page:
-
In the Authentication drop-down list, if you chose to index permissions, select the Confluence crawling user identity that you created for this source. Otherwise, select None.
-
In the Security Provider drop-down list, if you chose to index permissions, select the Confluence security provider that you created for this source (see Configuring an Atlassian Confluence V2 Security Provider). Otherwise, select None.
Note: When you select None, in the Authentication and Security Provider drop-downs, only your public (unsecured) Confluence content will be indexed.
-
Click Save and Start to save the source configuration and build the source.
-
-
When you chose to NOT index permissions, you must set the permissions globally for the source:
-
In the navigation menu on the left, select Permissions.
-
Next to Permissions, select the Specifies the security permissions to index option.
-
Next to Allowed Users, ensure that a well-known everyone group such as the Active Directory S-1-1-0 is added.
-
Click Apply Changes.
-
-
Validate that the source building process is executed without errors:
-
In the navigation panel on the left, click Status, and then validate that the indexing proceeds without errors.
OR
-
Open the CES Console to monitor the source building activities (see Using the CES Console).
-