Adding or Modifying Source Filters
Note: Use the ECMAScript regular expression syntax implemented for example by JavaScript (see Using Regular Expressions with JavaScript).
Exclusion filters are commonly used for any source type to prevent indexing of one or more subsections under the stating address.
Example: The starting address of a File connector source for human resources files is file://corp.MyCompany.com/dfs/dept/HR. You can use the following exclusion filter to prevent indexing retired employees documents that are all under the same folder:
file://corp.MyCompany.com/dfs/dept/HR/employees/retired/*
Inclusion filters are typically used for Web connector sources to include referred web pages outside of the starting address.
Example: The starting address of a Web connector source is http://www.MyCompany.com. This website refers to some pages on the related career website (http://career.MyCompany.com) that you want to also index. You can add the following inclusion filter pattern to also index job opportunities:
http://career.MyCompany.com/jobs/*
Note: You can also use inclusion filters with other source types to only index a few subsections under the starting address, but in this case, your source may waste resources by crawling a large number of folders that will not be indexed. The best practice is to rather create one source per subsection, or when the source type allows it, enter more than one starting address for one source.
To add or modify a source filter
-
On the Coveo server, access the Administration Tool (see Opening the Administration Tool).
-
In the Administration Tool, select Index > Sources and Collections.
-
In the Sources and Collections page, select the collection containing the source for which you want to add or modify source filters.
-
In the Sources section, select the source.
-
In the navigation panel on the left, select Filters.
-
In the Filters page:
-
To add or modify an exclusion filter:
-
Click Add an Exclusion Filter to create a new filter or click an existing exclusion filter to modify it.
-
In the Edit an Exclusion Filter page, in the Excluded Patterns box, enter one or more address patterns (one pattern per line) to exclude using either wildcard characters ( * or ?) or a regular expression.
Patterns must be subsections of existing inclusion filters otherwise they will be useless.
-
In the Type box, select the search pattern option used in the patterns you entered in Excluded Patterns.
-
Click Save/Apply Changes.
-
-
To add or modify an inclusion filter:
Note: Inclusion filters already listed correspond to entries in the Addresses box of the General page. In this box, using a slash or not, at the end of the starting address path for a file system folder (as opposed to a file) as no effect on what is crawled.
It however affects what is indexed with the inclusion filter automatically created for each starting address. When the trailing slash is omitted, the last folder of the starting address is truncated from the inclusion filter.
Example: The starting address file:///C:/temp/ creates the inclusion filter file:///C:/temp/*, while the starting address file:///C:/temp creates the inclusion filter file:///C:/*.
-
Click Add an Inclusion Filter to create a new filter or click an existing inclusion filter to modify it.
-
In the Edit an Inclusion Filter page, in the Allowed Patterns box, enter one or more address patterns (one pattern per line) to include using either wildcard characters ( * or ?) or a regular expression.
Note: A filter pattern can include a file extension to filter by file type but it is a better practice to define an appropriate document type set for this source (see Creating a Document Type Set).
-
In the Type box, select the search pattern option used in Allowed Patterns.
-
Click Save/Apply Changes.
-
If you are using inclusion filters to index subsections under the starting address:
Note: Using inclusion filters like this can have a significant performance cost. The best practices is to rather create one source per subsection to include, or when the source type allows it, enter more than one starting address for one source.
-
If the default wildcard * inclusion filter is present, select it, and then click Delete to remove it and prevent everything below the starting address to be indexed.
-
For a File connector source, in the General page, ensure that the Expand Before Filtering option is selected otherwise nothing will be crawled.
-
For a SharePoint connector source, in the Advanced page, ensure that the Expand sites and lists before applying filter option is selected otherwise nothing will be crawled.
-
-
-
-
Refresh the source for the modifications to take effect.