Product DocsMenu

Predefined Text Analytics Job Plugins

A job simply runs a piece of code. It does not work on a set of documents like a run does. The piece of code associated with many of the following predefined text analytics job plugins take advantage of the Coveo tagging mechanism to allow often efficient batch operations on text analytics tag fields.

CreateTagField

The CreateTagField plugin creates specified tag fields in the index. Unlike the outputter of runs, jobs cannot automatically create the tag fields corresponding to extracted metadata names as they are not part of a document pipeline. You must therefore run this job plugin to create necessary tag fields before running job plugins that save metadata values in these tag fields.

Example: The following job definition creates the @txtanexample1 and @txtanexample2 tag fields.

<Job Name="CreateTagFieldExample">
  <Impl>Coveo.TextAnalytics.Implementations.CreateTagFieldJob, Coveo.TextAnalytics.Implementations</Impl>  	  
  <Configuration>
    <CreateTagField>@txtanexample1</CreateTagField>
    <CreateTagField>@txtanexample2</CreateTagField>
    <!-- Create runs from templates using TAnGO to automatically fill the following parameters with default values -->
    <TagFieldCreatorName>MyDomain\MyTextAnalyticsAccount</TagFieldCreatorName>
    <TagFieldCreatorPassword Encrypted="True">bQxhRBprtSuC4UZcDzE3Dw==</TagFieldCreatorPassword>
  </Configuration>
</Job>  

ClearFieldJob

The ClearFieldJob plugin internally lists all values found in a tag field, and then uses one tagging query for each value to clear the values of a specific tag field for indexed documents returned by a specified query. This plugin is available in Text Analytics version 2.0.11+.

Note: When the number of available tag field values is large, building the list can take minutes during which nothing appears in the Job logs. Then, the tagging queries for each value start appearing in the job logs, progressively clearing the tag field values in the index.

Example: With the following job definition, the values for the @txtantheme tag field are deleted for all indexed documents returned by the @syslanguage=English query.

<Job Name="ClearFieldJob">
  <Impl>Coveo.TextAnalytics.Implementations.ClearFieldJob, Coveo.TextAnalytics.Implementations</Impl>  	
  <Configuration>
    <Name>ClearFieldJob</Name>
    <TagField>@txtantheme</TagField>
    <ScopeQuery>@syslanguage=English</ScopeQuery>
    <CESSearchHost>localhost</CESSearchHost>
    <CESSearchPort>52800</CESSearchPort>
    <CESCertificatePath>D:\CES7\Config\Certificates\cert-iis.p12</CESCertificatePath>	  
    <SuperUserToken>6ca0af68-3749-4c62-8854-8f4b70ba43c5</SuperUserToken>	  
  </Configuration>
</Job>  

MasterFieldMoverJob

The MasterFieldMoverJob plugin creates a new tag field containing the most frequent values from another tag field. The goal of this migration is to create a top list from a long list of values and assign the top list tag field to a facet rather than the original one to prevent long facet loading. This plugin is available in Text Analytics version 2.0.11+.

Example: Several million themes were extracted from a document set. With the following job definition, the @txtanmastertheme tag field is created and will contain the 10,000 most frequent themes found in the @txtantheme tag field.

<Job Name="MasterFieldMoverJob">
  <Impl>Coveo.TextAnalytics.Implementations.MasterFieldMoverJob, Coveo.TextAnalytics.Implementations</Impl>  	      
  <Configuration>
    <Name>MasterFieldMoverJob</Name>			
    <!-- Leave TagFieldCreatorName empty if you do not want to dynamically create the tag fields -->
    <TagFieldCreatorName></TagFieldCreatorName>
    <!-- If TagFieldCreatorPassword is empty, Coveo.TextAnalytics.Setup will automatically asks for this field value and encrypt it -->
    <TagFieldCreatorPassword Encrypted="True"></TagFieldCreatorPassword>
    <TagField>@txtantheme</TagField>
    <MasterTagField>@txtanmastertheme</MasterTagField>
    <MasterListSize>10000</MasterListSize>	
  </Configuration>
</Job>  

BlacklistJob

The BlacklistJob plugin uses tagging queries to remove blacklisted values defined in a specified flat text file from one or more specified tag fields for indexed documents returned by a specified query.

The format of the file containing the blacklisted expressions is the same as the one for the MetadataBlacklister normalizer used in run stages so you can share the file between them (see MetadataBlackLister).

Example: With the following job definition, the BlacklistJob plugin removes values defined in the D:\TextAnalytics\Config\normalizations\blacklist-example.txt file that are found in the @txtantheme and @txtanplace tag fields for indexed documents returned by the @uri="gov" query.

<Job Name="BlacklistJobExample">
  <Impl>Coveo.TextAnalytics.Implementations.BlacklistJob, Coveo.TextAnalytics.Implementations</Impl>  	
  <Configuration>
    <Name>BlacklistJobExample</Name>
    <TagField>@TXTANTheme</TagField>
    <TagField>@TXTANPlace</TagField>
    <ScopeQuery>@uri="gov"</ScopeQuery>
    <FilePath>D:\TextAnalytics\Config\normalizations\blacklist-example.txt</FilePath>
  </Configuration>
</Job>  

NormalizationJob

The NormalizationJob plugin uses tagging queries to normalize values in one or more specified tag fields as defined in a specified flat text file for indexed documents returned by a specified query.

The format of the file containing the normalization values is the same as the one for the MetadataNormalizer normalizer used in run stages so you can share the file between them (see MetadataNormalizer).

Example: With the following job definition, the NormalizationJob plugin homogenizes values found in the @txtantheme tag field using normalized values defined in the D:\TextAnalytics\Config\normalizations\normalization-example.txt file for English indexed documents returned by the @syslanguage=English query.

<Job Name="TestNormalizerJob">
  <Impl>Coveo.TextAnalytics.Implementations.NormalizationJob, Coveo.TextAnalytics.Implementations</Impl>  	
  <Configuration>
    <Name>TestNormalizerJob</Name>
    <TagField>@TXTANTheme</TagField>
    <ScopeQuery>@syslanguage=English</ScopeQuery>
    <FilePath>D:\TextAnalytics\Config\normalizations\normalization-example.txt</FilePath>
  </Configuration>
</Job>  

WhitelistBasicMatcherJob

The WhitelistBasicMatcherJob plugin uses tagging queries to add values defined in a specified flat text file to a specified tag field when found in indexed documents returned by a specified query.

The format of the whitelist file is the same as the one for the Whitelister plugin used in run stages so you can share the file between them (see Whitelister).

Example: With the following job definition, the WhitelistBasicMatcherJob plugin adds values defined in the D:\TextAnalytics\Config\whitelists\wizards-example.txt file to the @txtantheme tag field when found in indexed documents returned by the @syslanguage=English query.

<Job Name="TestWhitelistBasicMatcherJob">
  <Impl>Coveo.TextAnalytics.Implementations.WhitelistBasicMatcherJob, Coveo.TextAnalytics.Implementations</Impl>  	
  <Configuration>
    <Name>TestWhitelistBasicMatcherJob</Name>
    <TagField>@TXTANTheme</TagField>
    <ScopeQuery>@syslanguage=English</ScopeQuery>
    <FilePath>D:\TextAnalytics\Config\whitelists\wizards-example.txt</FilePath>
  </Configuration>
</Job>  

WhitelistQueryMatcherJob

The WhitelistQueryMatcherJob plugin reads a flat text file that specifies one or more queries and a corresponding tagging values. The plugin evaluates returned indexed documents against the queries, and when there is a match, adds the corresponding value to the specified tag field.

Example: With the following job definition, the WhitelistQueryMatcherJob plugin reads the query-based-job-example.txt file. When a document contains either of the following words:  cucurbita, squash, pumpkin, courgette, fruits is added to the @txtantheme tag field.

<Job Name="TestWhitelistQueryMatcherJob">
  <Impl>Coveo.TextAnalytics.Implementations.WhitelistQueryMatcherJob, Coveo.TextAnalytics.Implementations</Impl>  	
  <Configuration>
    <Name>TestWhitelistQueryMatcherJob</Name>
    <TagField>@TXTANTheme</TagField>
    <ScopeQuery>@syslanguage=English</ScopeQuery>
    <FilePath>D:\TextAnalytics\Config\whitelists\query-based-job-example.txt</FilePath>
  </Configuration>
</Job>  

The query-based-job-example.txt contains:

cucurbita OR squash OR pumpkin OR courgette	fruits

The file can contain one or more queries. Each query is on one line followed by a tab delimited tag field value. The file format is not compatible with any module used for runs.

Note: The expression in the first column of the file is used as is for the query and must therefore be a complete and valid query, including double-quote characters when needed.

People who viewed this topic also viewed