Add or edit a SharePoint Online source
Add or edit a SharePoint Online source
Members with the required privileges can index SharePoint Online or OneDrive content and make it searchable. In a Coveo-powered search interface, the source content is accessible to either everyone, some specific users and groups, or the same users and groups as in the content system.
SharePoint Online tenants typically hold large volumes of content. Follow the SharePoint Online source configuration leading practices to optimize indexing performance.
Notes
|
Source key characteristics
Features | Supported | Additional information | |
---|---|---|---|
SharePoint Online version |
Latest cloud version |
||
Indexable content |
Sites, sub-sites, user profiles, personal websites, lists, list items, list item attachments, document libraries, document sets, documents, web parts, and microblog posts and replies. |
||
Takes place every hour by default. |
|||
Takes place every week by default. |
|||
Content security options |
|||
Authentication and site access
A SharePoint Online source uses the OAuth 2.0 authorization protocol to access your SharePoint Online site content, and the source must authenticate through an Azure Active Directory application. You can choose to authenticate the Azure Active Directory application via a client certificate using app-only permissions, or using a delegated SharePoint Online user account (crawling account). The authentication method you choose depends mainly on your individual needs and corporate policy.
Before creating a new SharePoint Online source:
-
Determine the authentication method you want to use. The following subsections highlight the main advantages of each authentication method.
-
Perform the related prerequisites (see Certificate authentication prerequisites or Delegated authentication prerequisites).
Notes
|
Certificate authentication (recommended)
The main advantages of certificate authentication are:
-
Provides a higher throttling rate limit than delegated authentication, and is recommended for indexing large amounts of data.
-
Enables parallel refreshes, which improve content freshness by indexing latest changes even when the source is performing a long rescan operation.
-
Allows more flexibility than delegated authentication in terms of content access.
-
You can grant the source with permission to access all site content, personal sites, and user profiles in your SharePoint Online tenant without having to provide individual access to each site. The content that’s actually crawled and indexed depends on your source Content to include settings.
-
You can grant the source with access to only a subset of site collections using the
Sites.Selected
permission (instead ofSites.FullControl.All
).
-
-
Provides easier setup, as you don’t need to create and manage a crawling account in SharePoint Online, assign the account appropriate roles and permissions, and grant the account access to content.
Certificate authentication prerequisites
If you choose to authenticate using a certificate, you must perform the following before creating your SharePoint Online source:
-
Create the Azure Active Directory application and assign the required permissions.
-
Add the client certificate to the Azure Active Directory application.
Create a client certificate
Your SharePoint Online source uses the client certificate to authenticate the Azure Active Directory application to crawl your SharePoint Online tenant.
-
Create a CA-signed certificate using a trusted certificate authority (recommended), or a self-signed certificate using the method of your choice.
NoteThe certificate file format must be
.cer
,.pem
, or.cert
. You’ll need the certificate file when adding the certificate to your Azure Active Directory application. -
Export the certificate as a password-protected
.pfx
file. Depending on how you created the certificate file, the.pfx
file may be created for you automatically.NoteYou’ll need the
.pfx
file and password when creating your SharePoint Online source.
Create the Azure Active Directory application
The Azure Active Directory application that you create for use with your source grants Coveo the permissions to crawl your SharePoint Online tenant. Create the application and assign the required permissions as follows:
-
Access your Azure portal with an administrator account, and create (register) an Azure Active Directory application.
Notes-
Select Accounts in this organizational directory only for the Supported account type option when creating the application.
-
Once you register the application, you’re taken to the application Overview page in Azure. Take note of the Application (client) ID and Directory (tenant) ID as you’ll need them when creating your SharePoint Online source.
-
-
Grant the Azure Active Directory application the required crawling permissions as follows:
-
If you’re currently on your application’s page in Azure, proceed to the next step. Otherwise, access your Azure portal with an administrator account, click App registration, and then click the application you created previously.
-
Click API permissions.
-
If the User.Read permission is added by default, click the permission, and then click Remove permission.
-
For each of the following required permissions, click Add a permission, and then in the Microsoft APIs tab:
-
Click Microsoft Graph, click Application permissions, and then add the following permissions:
-
Sites.Read.All
(recommended) orSites.Selected
-
Click SharePoint, click Application permissions, and then add the following permissions:
-
Sites.FullControl.All
(recommended) orSites.Selected
Sites.Selected
isn’t compatible with the All sites and Hub sites content retrieval options.
If you select theSites.Selected
scope, you need to grant the applicationFullControl
access on a per-site basis, for each site you want to index. See Controlling app access on specific SharePoint site collections is now available in Microsoft Graph and Use Sites.Selected Permission with FullControl rather than Write or Read for further details.Create only one application per tenant when using
Site.Selected
. Then re-use the certificate for all sites to crawl across all sources that target the same tenant.
You can’t target a set of sites using one application and another set of sites in the same tenant using another application. A security provider for the tenant is created when the first source is created. If you create a second source to capture content from the same tenant, the second source will use the same security provider as the first.
-
-
-
-
Once you’ve added all the required permissions, grant tenant-wide admin consent to the application.
NoteYou must have the appropriate user role to consent on behalf of the organization.
Add the client certificate to the Azure Active Directory application
Follow the Microsoft documentation to add your certificate to the Azure Active Directory application.
(Optional) Create a selected sites list in SharePoint Online
With certificate authentication, if you want to use the Selected sites list option, you need to create a list of the selected site collections you’ve granted your application access to. You can then reference the URL of this list in your SharePoint Online source.
To create the list in SharePoint Online
-
In your SharePoint Online tenant, access one of the sites that your application has access to.
-
Go to Site Contents.
-
Click + New > List.
-
Enter a descriptive Name for the list (for example,
selected-sites-list
), and optionally, a Description. Then click Create. -
In the list, click + Add column.
-
Select the
Hyperlink
column type, and then click Next. -
Enter a descriptive Name (for example,
Site URL
), and then click Save. -
For each site collection that you want to index, add a new list item, and then enter the site URL in the
Site URL
column.
Azure application permissions with certificate authentication
To work with Microsoft APIs (CSOM and REST), Coveo must authenticate via an Azure Active Directory application that has the proper permissions. The access token is then limited to these permissions, which are necessary to successfully crawl SharePoint Online.
You must provide tenant-wide admin consent for the permissions in the Azure Active Directory application that’s used to authenticate your source. Typically, you provide consent when creating the Azure Active Directory application for use with Coveo, but you can do so at a later time (see Grant admin consent in App registrations).
Notes
|
The following table provides a description of the permissions that you must grant the application when using certificate authentication.
API | Permission | Justification | ||
---|---|---|---|---|
SharePoint |
Allows Coveo to retrieve permissions of crawled items, such as sites, users, lists, and documents.
|
|||
Grants Coveo the permission to access only a specified subset of site collections. You need to grant the application the
The Example
If you select the Hub sites content to retrieve option, the application needs to have access to all targeted hub sites and all sites linked to these hub sites. |
||||
Grants Coveo the permission to crawl user profiles.
|
||||
Microsoft Graph |
Grants Coveo the permission to crawl site content.
|
|||
Grants Coveo the permission to access only a specified subset of site collections (see Microsoft Graph permissions reference). |
||||
Coveo requires this permission to fetch:
|
||||
Coveo uses this permission to obtain the ID of a group (represents an Azure Active Directory group, which can be an Office 365 group, or a security group), and then a list of the group members (see Get group and List members). |
Delegated authentication
For delegated authentication, the Azure Active Directory application is automatically created in your SharePoint tenant when you create the source, and is linked to the permissions of the crawling account that you create.
Note
The Azure Active Directory application appears as SharePoint Online Connector in your Azure portal Enterprise applications page. |
The main advantages of delegated authentication are:
-
Provides a way to give the crawling account, and by association your source, access to crawl only specific sites and user profiles.
-
Provides a way to grant the crawling account with minimal permissions when accessing site content.
However, an important drawback of delegated authentication is that it’s more prone to throttling than certificate authentication.
Delegated authentication prerequisites
If you decide to use delegated authentication, you must perform the following before creating your SharePoint Online source:
-
Create a SharePoint Online user account (crawling account) with appropriate roles and permissions.
Azure application permissions with delegated authentication
A SharePoint Online source uses the OAuth 2.0 authorization protocol. To work with Microsoft APIs (CSOM and REST), Coveo must authenticate via an Azure Active Directory application that has the proper permissions. The access token is then limited to these permissions, which are necessary to successfully crawl SharePoint Online.
You must provide tenant-wide admin consent for the permissions in the Azure Active Directory application that’s used to authenticate your source. Provide admin consent directly from your SharePoint Online source panel when creating your source (requires SharePoint Global Admin credentials), or from your Azure portal after creating your source (see Grant tenant-wide admin consent in Enterprise apps).
Note
The Azure Active Directory application that’s automatically created in your SharePoint Online tenant after you create your source appears as SharePoint Online Connector in your Azure portal’s Enterprise applications page. |
Notes
|
The following table lists the permissions that are automatically assigned to the application when using delegated authentication.
API | Permission | Justification | ||||
---|---|---|---|---|---|---|
SharePoint |
|
Allows Coveo to retrieve permissions of crawled items, such as sites, users, lists, and documents.
|
||||
|
Grants Coveo the permission to crawl user profiles.
|
|||||
Microsoft Graph |
|
Grants Coveo the permission to crawl site content.
|
||||
|
Coveo requires this permission to fetch:
|
|||||
|
Coveo uses this permission to obtain the ID of a group (represents an Azure Active Directory group, which can be an Office 365 group, or a security group), and then a list of the group members (see Get group and List members). |
Domain Name System records configuration for Microsoft 365
Regardless of the chosen authentication method, if you’re using custom domains in SharePoint Online, you must configure your Domain Name System (DNS) records for Microsoft 365.
-
Access the Domains page of your Office 365 admin center.
-
Select your corporate domain (not
company.onmicrosoft.com
) checkbox. -
On the domain page, in the DNS records section, take note of the DNS records.
-
On the domain page, in the DNS records section, click Check health to ensure that the DNS records were correctly configured.
Add or edit a SharePoint Online source
This section details how to add or edit a SharePoint Online source.
-
Determine the authentication method you want to use.
-
Perform the related prerequisites (see Certificate authentication prerequisites or Delegated authentication prerequisites).
-
If applicable, configure your DNS records for Microsoft 365.
-
On the Sources (platform-ca | platform-eu | platform-au) page, do one of the following:
-
To create a new source, click Add source, and then click SharePoint Online.
-
To edit an existing source, click your SharePoint Online source, and then click Edit in the Action bar.
Leading practiceIt’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually.
See About non-production organizations for more information and best practices regarding sandbox organizations.
-
-
Specify your source settings on the Add/Edit a SharePoint Online Source subpage. Refer to the following sections for detailed information on the source settings:
NoteYou can save your source settings at any time by clicking Add and build source/Add source, or Save and rebuild source/Save.
-
Build or rebuild your source.
"Configuration" tab
On the Add/Edit a SharePoint Online Source subpage, the Configuration tab is selected by default. It contains your source general and content information, as well as other parameters.
General information
Source name
Enter a name for your source.
Leading practice
A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens ( |
Optical Character Recognition (OCR)
If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option.
The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick View. See Enable optical character recognition for details on this feature.
"Authentication" section
You can authenticate the source to access your SharePoint Online content using a certificate or a delegated user account.
Note
Your source authentication access token can potentially expire and become invalid. See Update a SharePoint Online Access Token for information on what causes the access token to expire and how to update an expired access token. |
-
Select whether to use Certificate or OAuth2 (delegated) authentication.
-
Specify the corresponding settings:
-
For Certificate authentication:
-
Enter your SharePoint Online Tenant name or tenant address.
Examples-
SharePoint Online tenant name:
mycompany
-
SharePoint Online tenant address:
https://mycompany.sharepoint.com
-
-
Enter your SharePoint Online Tenant id.
-
Enter the Client id for the Azure Active Directory application that you created for your source.
-
Enter your Certificate password.
-
Click Certificate file to upload your .pfx certificate.
-
-
For OAuth2 (delegated) authentication:
-
Click Authorize account.
-
Enter your SharePoint Online Tenant name or tenant address, and then click Sign In.
Examples-
SharePoint Online tenant name:
mycompany
-
SharePoint Online tenant address:
https://mycompany.sharepoint.com
-
-
Provide admin consent using SharePoint Online user credentials that have the Global Admin role by following the steps detailed here, or proceed to the next step if you wish to provide consent from your Azure portal after creating your source.
NoteYou can switch your source to the crawling account after you provide admin consent.
-
Enter the Email and Password of a SharePoint account with the Global Admin role.
-
Select Consent on behalf of your organization.
-
Click Accept.
-
To switch the source to the crawling account, click Authorize account again, enter your SharePoint Online Tenant name, click Sign In, and then proceed to the next step.
-
-
Enter the Email and Password of the crawling account that you created earlier and that has access to the desired SharePoint Online content, and then click Sign in.
NoteWhen you create two SharePoint Online sources retrieving content with the same tenant, they share their security providers, which increases the speed of the security identities refresh operation. You must, however, use the same limited administrator credentials for both sources.
-
-
"Content to include" section
Specify the content that your source indexes and makes searchable to users in a Coveo-powered search interface.
Note
For implementations using the quickview component in a Coveo JavaScript Search Framework result template, and ASPX list items in SharePoint Online, quick view is supported only for ASPX list items (pages) of type |
-
Select whether you want to index SharePoint Online or OneDrive content:
-
OneDrive: Index only the document libraries in OneDrive, including the My Files content, of users' personal sites. If you want to index all content of users' personal sites (all site collections), in addition to document libraries, select SharePoint Online, and then choose the Personal sites option. However, if you only want to index user documents, we recommend using the OneDrive option to limit the crawling scope.
-
SharePoint Online: You can choose to index all or specific SharePoint Online sites, lists, all content in user personal sites, or user profiles.
NoteUser access to the indexed items through a Coveo-powered search interface depends on your source Content Security setting. Personal/OneDrive documents and folders are private unless they’re shared with others.
-
-
Specify the corresponding options.
-
For OneDrive, your source indexes the OneDrive documents for the users to which the source has access. You can select Folders to also index folders and document sets.
NoteFor certificate authentication, your source has access to all user content. For delegated authentication, the crawling account must be set as an owner in all personal sites that you want to index.
-
-
Select the content to retrieve:
-
All sites
For certificate authentication, all sites in your SharePoint Online tenant will be indexed and searchable. For delegated authentication, only the sites that the crawling account is allowed to access will be indexed and searchable.
NoteThis option corresponds only to top-level site collections and their associated content. It doesn’t include personal-site content.
Use the default
*
wildcard inclusion filter scope on your source. No items will be indexed if you change the default inclusion filtering. -
Hub sites
You can choose to index the content of all sites that are associated with a SharePoint hub site. This includes all the associated site’s subsites and lists.
NoteFor delegated authentication, the crawling account must have access to the hub site and the associated sites. If the crawling account has access only to a subset of the associated sites, just those sites will be indexed and searchable.
In the URL field, enter the URL corresponding to the desired hub site. Each URL must include the protocol and tenant name.
ExampleSites
https://site:8080/sites/support
andhttps://site:8080/sites/hr
are associated with your SharePoint Online hub site (https://site:8080/sites/Main
), so you enterhttps://site:8080/sites/Main
in the URL field to index the content of both associated sites.Use the default
*
wildcard inclusion filter scope on your source. No items will be indexed if you change the default inclusion filtering. -
Specific items
You can choose to make only certain items searchable, such as specific sites, lists, websites, and subsites, by entering the corresponding URLs in the URL field. Each URL must include the protocol and tenant name.
Notes-
For delegated authentication, the crawling account must have access to the specified sites.
-
A specific folder in a list isn’t supported.
Examples-
For a specific site:
https://site:8080/sites/support
-
For a specific website:
https://site:8080/sites/support/subsite
-
For a specific list:
https://site:8080/sites/support/lists/contacts/allItems.aspx
-
-
Personal sites
You can choose to index only the content of personal sites, which includes site collections and OneDrive documents, from your SharePoint Online tenant.
NoteFor delegated authentication, the crawling account must be set as an owner in all personal sites that you want to index.
-
When using certificate authentication with the
Sites.Selected
application permission, you can index content you’ve granted the application access to by referencing a custom SharePoint Online list of site collections.To reference a SharePoint Online list of site collections
-
In SharePoint Online, create a list of the site collections the application has access to and that you want to index.
The list must be in a site that the application has access to.
-
On the Add/Edit a SharePoint Online Source page, in the URL field, enter the URL of the list of site collections (for example,
https://tenant.sharepoint.com/sites/sitea/Lists/selectedsites/AllItems.aspx
).
-
-
User profiles
You can choose to index only the user profiles in your SharePoint Online tenant.
NoteFor delegated authentication, the crawling account must be set as an owner in the personal sites for the user profiles that you want to index.
-
-
If you selected All sites, Hub sites, Specific items, Personal sites, or Selected sites list under Additional content, select whether to index the following:
-
Folders
Select this option to index list folders and document sets.
-
Unapproved items
Select this option to retrieve unapproved items, which are items with a
Draft
orPending
approval status, from lists where moderation is activated. If an unapproved version exists for an item that’s alreadyApproved
, your source indexes the unapproved item instead of the approved item. As a result, the unapproved item appears in Coveo search results. If this option is disabled, your source indexes onlyApproved
items.ExampleIn a list where moderation is active, a document named
Meeting Notes
isApproved
and indexed by Coveo. This document version is 1.0. However, a coworker editsMeeting Notes
, thereby creating version 1.1, and the document status becomesDraft
. Then, your SharePoint Online source is rescanned. IfUnapproved items
is enabled in your source, version 1.0 is deleted from the Coveo index and is replaced with the draft version 1.1. IfUnapproved items
is disabled in your source, Coveo indexes version 1.0 as version 1.1 isn’t yetApproved
.In lists where moderation is deactivated, Coveo indexes the latest version of an item, be it
Approved
,Draft
, orPending
. In this case, this option doesn’t apply.NoteFor SharePoint lists that require documents to be checked out before editing, Coveo doesn’t index a document while it’s checked out regardless of the
Unapproved items
option and the list moderation setting in SharePoint. If a checked out item is checked in and its status changes toDraft
orPending
, the unapproved item is indexed only if theUnapproved items
option is enabled in your source or if moderation is deactivated for the list.
-
-
-
"Filters" section
Use this section to ignore specific items when indexing. There are four ways to filter out content:
-
By configuring an item modification filtering window.
-
Using a metadata value condition.
Item modification filtering window
You can configure your SharePoint Online source to index only items that were modified within a specified time range.
To configure this rolling window, set the amount and period.
The supported periods are Day
, Week
, Month
, and Year
.
You want your source to index only items that were modified within the last 2 years, so you configure the rolling window as follows:
To disable the item modification filtering window, set the amount to 0
.
Configuring an item modification filtering window has the following effects during source updates:
Update type | Effects |
---|---|
rebuild |
Your source is emptied, then only items modified within the rolling window are added to your source. |
rescan |
The connector crawls the entire content your source targets, and:
|
refresh |
The connector crawls items that have been modified/added since the last source update and items whose last modified date is outside the rolling window, and:
|
Metadata exclusion condition
You can define a condition based on metadata values to prevent items from being crawled.
Conditions must reference metadata names using the %[METADATA_NAME]
syntax, where METADATA_NAME
is replaced with the actual metadata name.
Metadata names are case-sensitive.
The View Metadata subpage lists metadata names available in your source.
Given metadata-based exclusion is applied at the crawling stage of the Coveo indexing pipeline, make sure you only select metadata whose Origin
value is Crawler
.
The condition may be a single expression or a combination of expressions.
The following operators are supported: AND
, OR
, Exists
, NOT
, ==
, >
, and <
.
Parentheses are also supported to specify operation order.
The |
The following table gives examples of conditions and their effects:
Condition | Matches items that | Indexing result |
---|---|---|
|
Have a |
All items with a |
|
Don’t have a |
All items that don’t have a |
|
Have the |
All items with a |
|
Have the |
All items whose |
URL inclusion and exclusion filters
Note
|
Inclusion filters
Your source indexes only the pages that match a URL expression specified in this section.
Note
|
-
Enter a URL expression to apply as the inclusion filter.
-
Select whether the URL expression uses a Wildcard or a Regex (regular expression) pattern.
Leading practice
You can test your regexes to ensure that they match the desired URLs with tools such as Regex101. You can customize regexes to meet your use case focusing on aspects such as:
For example, you want to index HTML pages on your company staging and dev websites without taking the case sensitivity or the trailing slash (/) into account, so you use the following regex:
The regex matches the following URLs:
but doesn’t match the following ones:
|
The www.mycompany.com
website you crawl contains versions in several languages and you want to have one source per language.
For the US English
source, if the source URL is www.mycompany.com/en-us/welcome.html
, the inclusion filter would be www.mycompany.com/en-us/*
.
Exclusion filters
Your source ignores content from pages that match a URL expression specified in this section.
Note
A source URL must not be part of the exclusion filter scope, otherwise the corresponding content won’t be indexed.
For example, if you entered |
-
Enter a URL expression to apply as the exclusion filter.
Notes-
Exclusion filters also apply to shortened and redirected URLs.
-
By default, if pages are only accessible via excluded pages, those pages will also be excluded.
-
Exclusion filters for Sharepoint Online sources are not case sensitive when using a Regex (regular expression). For example,
(company-(dev|staging)).*html.?$
will matchhttp:// ComPanY-dev/important/document.html
without adding any additional symbols to account for case sensitivity. Exclusion filters are case sensitive when using Wildcard expressions.
-
-
Select whether the URL expression uses a Wildcard or a Regex (regular expression) pattern.
-
There’s no point in indexing the search page of your website, so you exclude its URL:
www.mycompany.com/en-us/search.html
-
You don’t want to index ZIP files that are linked from website pages:
www.mycompany.com/en-us/*.zip
List template types to ignore
You can configure your SharePoint Online source to ignore specific SharePoint list template types when indexing items.
Enter the list template types to ignore by adding a separate entry for each template type.
You don’t want your source to index DocumentLibrary
and Tasks
template-type items. Therefore, you enter the following:
Note
Once configured, the list template types to ignore appear in the
|
"Content security" tab
Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.
Note
When using the Same users and groups as in your content system content security option, you can map Microsoft 365 email aliases to their corresponding primary email addresses so that your repository’s content permissions are respected when a user logs in to a Coveo search interface using an email alias. |
When using the Everyone content security option, see Safely Apply Content Filtering for information on how to ensure that your source content is safely filtered and only accessible by intended users. |
"Access" tab
In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.
For example, when creating a new source, you could decide that members of Group A can edit its configuration while Group B can only view it.
See Custom access level for more information.
Completion
-
Finish adding or editing your source:
-
When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add source/Save.
-
When you’re done editing the source and want to make changes effective, click Add and build source/Save and rebuild source.
NoteOn the Sources (platform-ca | platform-eu | platform-au) page, you must click Launch build or Start required rebuild in the source Status column to add the source content or to make your changes effective, respectively.
Back on the Sources (platform-ca | platform-eu | platform-au) page, you can follow the progress of your source addition or modification.
Once the source is built or rebuilt, you can review its content in the Content Browser.
NoteIf you selected Specific Items or User Profiles in the Content to Include section, some additional items will appear in the Content Browser. To retrieve user profiles, Coveo must dig through your SharePoint Online instance, including your host site collection and the documents it contains. The items it encounters in the process are retrieved as well and therefore appear in the Content Browser.
-
-
Once your source is done building or rebuilding, review the metadata Coveo is retrieving from your content.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View metadata in the Action bar.
-
If you want to use a currently not indexed metadata in a facet or result template, map it to a field.
-
Click the metadata and then, at the top right, click Add to Index.
-
In the Apply a mapping on all item types of a source panel, select the field you want to map the metadata to, or add a new field if none of the existing fields are appropriate.
Notes-
For details on configuring a new field, see Add or edit a field.
-
For advanced mapping configurations, like applying a mapping to a specific item type, see Manage mappings.
-
-
Click Apply mapping.
-
-
Depending on the source type you use, you may be able to extract additional metadata from your content. You can then map that metadata to a field, just like you did for the default metadata.
More on custom metadata extraction and indexing
Some source types let you define rules to extract metadata beyond the default metadata Coveo discovers during the initial source build.
For example:
Source type Custom metadata extraction methods Define metadata key-value pairs in the
addOrUpdate
section of thePUT
request payload used to upload push operations to an Amazon S3 file container.REST API
and
GraphQL APIIn the JSON configuration (REST API | GraphQL API) of the source, define metadata names (REST API | GraphQL API) and specify where to locate the metadata values in the JSON API response Coveo receives.
Add
<CustomField>
elements in the XML configuration. Each element defines a metadata name and the database field to use to populate the metadata with.-
Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors.
-
Extract metadata from JSON-LD
<script>
tags.
-
Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors.
-
Extract JSON-LD
<script>
tag metadata. -
Extract
<meta>
tag content using theIndexHtmlMetadata
JSON parameter.
Some source types automatically map metadata to default or user created fields, making the mapping process unnecessary. Some source types automatically create mappings and fields for you when you configure metadata extraction.
See your source type documentation for more details.
-
-
When you’re done reviewing and mapping metadata, return to the Sources (platform-ca | platform-eu | platform-au) page.
-
To reindex your source with your new mappings, click Launch rebuild in the source Status column.
-
Once the source is rebuilt, you can review its content in the Content Browser.
-
Safely apply content filtering
The best way to ensure that your indexed content is seen only by the intended users is to enforce content security by selecting the Same users and groups as in your content system option. Should this option be unavailable, select Specific users and groups instead.
However, if you need to configure your source so that the indexed source content is accessible to Everyone, you should adhere to the following leading practices. These practices ensure that your source content is safely filtered and only accessible by the appropriate users:
-
Configure query filters: Apply filter rules on a query pipeline to filter the source content that appears in search results when a query goes through that pipeline.
-
Use condition-based query pipeline routing: Apply a condition on a query pipeline to make sure that every query originating from a specific search hub is routed to the right query pipeline.
-
Configure the search token: Authenticate user queries via a search token that’s generated server side that enforces a specific search hub.
Following the above leading practices results in a workflow whereby the user query is authenticated server side via a search token that enforces the search hub from which the query originates. Therefore, the query can’t be modified by users or client-side code. The query then passes through a specific query pipeline based on a search hub condition, and the query results are filtered using the filter rules.
Configure query filters
Filter rules allow you to enter hidden query expressions to be added to all queries going through a given query pipeline.
They’re typically used to add a field-based expression to the constant query expression (cq
).
You apply the @objectType=="Solution"
query filter to the pipeline to which the traffic of your public support portal is directed.
As a result, the @objectType=="Solution"
query expression is added to any query sent via this support portal.
Therefore, if a user types Speedbit watch wristband
in the search box, the items returned are those that match these keywords and whose objectType
has the Solution
value.
Items matching these keywords but having a different objectType
value aren’t returned in the user’s search results.
To learn how to configure query pipeline filter rules, see Manage filter rules.
Note
You can also enforce a filter expression directly in the search token. |
Use condition-based query pipeline routing
The most recommended and flexible query pipeline routing mechanism is condition-based routing.
When using this routing mechanism, you ensure that search requests are routed to a specific query pipeline according to the search interface from which they originate, and the authentication is done server side.
To accomplish this:
-
Apply a condition to a query pipeline based on a search hub value, such as Search Hub is Community Search or Search Hub is Agent Panel. This condition ensures that all queries that originate from a specific search hub go through that query pipeline.
-
Authenticate user queries via a search token that’s generated server side and that contains the search hub parameter that you specified in the query pipeline.
Configure the search token
When using query filters to secure content, the safest way to enforce content security is to authenticate user queries using a search token that’s generated server side. For instance, when using this approach, you can enforce a search hub value in the search token. This makes every authenticated request that originates from a component use the specified search hub, and therefore be routed to the proper query pipeline. Because this configuration is stored server side and encrypted in the search token, it can’t be modified by users or client-side code.
Implementing search token authentication requires you to add server side logic to your web site or application. Therefore, the actual implementation details will vary from one project to another.
The following procedure provides general guidelines:
Note
If you’re using the Coveo In-Product Experience (IPX) feature, see Implement advanced search token authentication. |
-
Authenticate the user.
-
Call a service exposed through Coveo to request a search token for the authenticated user.
-
Specify the
userIDs
for the search token, and enforce asearchHub
parameter in the search token.
Note
You can specify other parameters in the search token, such as a query |
For more information and examples, see Search token authentication.
Update a SharePoint Online access token
Your SharePoint Online source uses the OAuth 2.0 authorization protocol to access your SharePoint Online site content via an Azure Active Directory application that has the required permissions (see Authentication and site access).
The access token is linked to the certificate or SharePoint Online user account (crawling account) that you specified in your source configuration, and you must update the access token manually if it’s no longer valid. An invalid access token occurs when:
-
(certificate authentication) the certificate expires
-
(delegated authentication) the SharePoint Online crawling account’s credentials (email and/or password) are modified
An Authentication issue error appears for your source on the Sources (platform-ca | platform-eu | platform-au) page when your SharePoint Online source access token is no longer valid.
Note
A source Authentication issue error may also appear due to configuration or connectivity issues. If the certificate hasn’t expired, or the crawling account’s credentials haven’t changed, verify the following:
|
To update the access token
-
For certificate authentication:
-
Add your certificate to the Azure Active Directory application that you created for use with your source.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your SharePoint Online source, and then click Edit in the Action bar.
-
In the Authentication section, click Certificate file to upload your new certificate.
-
Enter the Certificate password.
-
Click Save or Save and rebuild source.
-
For delegated authentication:
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your SharePoint Online source, and then click Edit in the Action bar.
-
In the Authentication section, click Authorize Account.
-
Enter your SharePoint Online Tenant name or tenant address, and then click Sign In.
-
Enter the Email and Password of the crawling account, and then click Sign in.
-
Click Save or Save and rebuild source.
-
Required privileges
You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.
Note
The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information. |
Actions | Service | Domain | Required access level |
---|---|---|---|
View sources, view source update schedules, and subscribe to source notifications |
Content |
Fields |
View |
Sources |
|||
Organization |
Organization |
||
Edit sources, edit source update schedules, and view the View Metadata subpage |
Content |
Fields |
Edit |
Sources |
|||
Content |
Source metadata |
View |
|
Organization |
Organization |
What’s next?
-
If you have the Enterprise edition, group your implementation resources together by adding your source to a project.