Product DocsMenu

Text Analytics Global Configuration Parameters

Text analytics global configuration parameters apply to the run(s) and job(s) defined in the current configuration file. The values of several global configuration parameters are automatically set when you use a template to create a new configuration file (see Creating a Custom Run or Job from a Template).

Global configuration parameters:

<Continuous>

Set this parameter to True to launch the run at a regular time interval specified in the <SleepBetweenRuns> parameter. Activate the continuous mode when you want to process new indexed documents at regular intervals. The launch is canceled when the run is already active.

<SleepBetweenRuns>

This parameter specifies the time interval in milliseconds (ms) between runs when the <continuous> parameter is set to True. Select a time interval that is long enough to ensure that a typical incremental run completes within the time interval, and short enough to ensure new indexed documents are readily processed to maintain the text analytics data freshness without wasting CPU resources.

<NbThreads>

This parameter specifies the number of threads that the pipeline can use. Using more than one thread is particularly useful with CPU intensive extractors such as the SalienceMetadataExtractor. Text analytics processing such as theme and named entity extraction can be CPU intensive. Select the number of threads to be equal to the number of CPU cores that you can afford to devote to text analytics.

<StateDir>

For runs only, this parameter specifies the folder where the pipeline state is saved in a cookie file. When using the CESIQuerierFetcher, the state contains the rowid of the last processed document. The default value is [Text_Analytics_Path]\Config\state\.

<ThemeMetaName>

For runs only, this parameter is required to specify the name to use for the Themes metadata when the pipeline includes the SalienceMetadataExtractor plugin (see SalienceMetadataExtractor).

<SentimentMetaName>

For runs only, this parameter is required to specify the name to use for the Sentiment metadata when the pipeline includes the SalienceMetadataExtractor plugin (see SalienceMetadataExtractor).

<CESCertificatePath>

This parameter specifies the folder and the filename of the valid Coveo Enterprise Search (CES) search security certificate that the module must use to be able to fetch documents from the unified index. The default value is [Index_Path]\Config\Certificates\cert-iis.p12.

When the Text Analytics module is installed on a server other than the Coveo Master server, ensure that the path and file name correspond to where you copied the file (see Performing the First Time Setup of the Text Analytics Module).

Example: When the Text Analytics module is installed on the Coveo Master server, the default is:

<CESCertificatePath>C:\ces7\Config\Certificates\cert-iis.p12 </CESCertificatePath

<CESSearchHost>

This parameter specifies the address of the Coveo Master server to use to fetch and tag indexed documents. You can enter localhost when the Text Analytics module is installed on the Coveo Master server.

<CESSearchPort>

This parameter specifies the port to use on the Coveo Master server to fetch and tag indexed documents. The default value is 52800 (see About the CES Service Port).

<SuperUserToken>

This parameter specifies the super user ID that the user running the Coveo Job Scheduling service must pass to CES to be able to fetch all indexed documents. Paste the super user ID that you created (see Text Analytics Module - Deployment Overview).

Example: Your super user ID is an hexadecimal GUID similar to this one: <SuperUserToken>e401b92d-0f40-4b44-a85e-0eb56d9e06c2</SuperUserToken>

<FetchBatchSize>

This parameter specifies the number of documents fetched from the unified index by the CESIQuerierFetcher plugin for each batch. The default value is 100 and the maximum value is 1000. This parameter is available in Text Analytics version 2.0.11+.

Text Analytics 2.0.13+ (November 2012)

This parameter specifies the level of details that is logged. The default value is All. Other options are: WarningsAndErrors and ErrorsOnly.

Example: The following configuration file sample shows the global configuration section as it appears in a run template where placeholders in the %%[Parameter value]%% format will be replaced by appropriate values when you create a pipeline configuration file using TAnGO (see Creating a Custom Run or Job from a Template).

<?xml version="1.0" encoding="utf-8"?>
<TextAnalyticsService>
  <!-- Global configuration parameters -->
  <Configuration>
    <!-- The run will execute continuously, looking for new documents to process after the first pass is completed. Waits SleepBetweenRuns (in ms) before checking for new results to process -->
    <Continuous>True</Continuous>
    <FetchBatchSize>100</FetchBatchSize>
    <SleepBetweenRuns>30000</SleepBetweenRuns>
    <NbThreads>2</NbThreads>
    <!-- The name of the metadata for themes -->
    <ThemeMetaName>Theme</ThemeMetaName>
    <!-- Location of the file used to save the value of the ID of the latest processed document -->
    <StateDir>%%TextAnalyticsRootDirectory%%\Config\state</StateDir>
    <CESSearchHost>%%CESSearchHost%%</CESSearchHost>
    <CESSearchPort>%%CESSearchPort%%</CESSearchPort>
    <CESCertificatePath>%%CESCertificateFile%%</CESCertificatePath>
    <!-- Super user token used to provide read access to all indexed documents to the text analytics processes. -->
    <SuperUserToken>%%CESSuperUserToken%%</SuperUserToken>
  </Configuration>
  <!-- Definition of the run -->
  <Run Name="MainRun">
    ...
  </Run>
</TextAnalyticsService>  

What's Next?

Look at the available predefined run and job plugins (see Text Analytics Run Plugins and Predefined Text Analytics Job Plugins).

People who viewed this topic also viewed