Product DocsMenu

Predefined Text Analytics Outputters

One outputter stage completes the text analytics pipeline by sending the extracted information to the desired output.

Note: You can create a pipeline with no outputter stage to review logs of the process without saving to results.

CESITaggerOutputter

The CESITaggerOutputter plugin is the module used to save the result of the text analytics pipeline back in the Coveo unified index using the tagging mechanism. This outputter saves metadata values extracted by the text analytics pipeline for processed documents in tag fields. When it does not already exist in the index, a tag field is automatically created and named by concatenating the specified prefix with the metadata name. You can optionally clear all or specified tag fields to ensure the output of the current pipeline replaces existing values rather than be appended to existing values.

Example: With the following outputter definition, in the Coveo unified index, the values of the @txtantheme, @txtancompany, @txtanperson, and @txtanplace tag fields are cleared for all processed documents and the metadata extracted by this pipeline is associated to processed documents in the corresponding tag fields.

<Outputter>
  <Impl>Coveo.TextAnalytics.Implementations.CESITaggerOutputter, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <TagNamePrefix>@TXTAN</TagNamePrefix>
    <ClearAllTextAnalyticsTags>False</ClearAllTextAnalyticsTags>
    <!-- Clear specified tag fields. Enter the tag field name, not the metadata name. -->
    <ClearTagField>@txtantheme</ClearTagField>
    <ClearTagField>@txtancompany</ClearTagField>
    <ClearTagField>@txtanperson</ClearTagField>
    <ClearTagField>@txtanplace</ClearTagField>
    <!-- Following parameters automatically filled when created from templates using TAnGO -->
    <CertificatePath>D:\CES7\Config\Certificates\cert-iis.p12</CertificatePath> 
    <TagFieldCreatorName>MyDomain\MyTextAnalyticsAccount</TagFieldCreatorName>
    <TagFieldCreatorPassword Encrypted="True">dQxhR3dFtSuC4UZbDyE3Dw==</TagFieldCreatorPassword>
  </Configuration>
</Outputter>  

Available parameters are:

<TagNamePrefix>

This required parameter specifies the prefix concatenated to the metadata names to build tag field names. The prefix can only include alphanumerical characters.

Example: With <TagNamePrefix>@TXTAN</TagNamePrefix>, Theme metadata values are injected in the unified index in the @txtantheme tag field.

<ClearAllTextAnalyticsTags>

Set this optional parameter to True to delete all values from tag fields for which the name starts with the prefix specified in the <TagNamePrefix> parameter. The default value is False.

Important: Be careful, the <ClearAllTextAnalyticsTags> parameter will delete values created by other runs and jobs for tag fields which names start with the value specified in the <TagNamePrefix> parameter.

Example: When you have a first run with the <TagNamePrefix> parameter set to TXTAN2 and a second run in which it is set to TXTAN, because TXTAN is a subset of TXTAN2, setting <ClearAllTextAnalyticsTags> to True on the second run deletes tag field values produced by the first run!

<ClearTagField>

This optional parameter clears the values of the specified tag field. You can specify multiple instances of this parameter.

<TagFieldCreatorName>
<TagFieldCreatorPassword>

These required parameters contain the username and password of a user that has permissions to create tag fields in the Coveo unified index.

Note: These parameters will be automatically filled with default values (encrypted for the password) when creating your pipeline configuration files from TAnGO using a template (see Creating a Custom Run or Job from a Template).

<TagFieldCreatorSecurityProvider>

This optional parameter specifies the security provider where the user specified in <TagFieldCreatorName> parameter is defined. The default value is Active Directory.

<TagFieldSIDName>

This optional parameter specifies the tag field security ID (SID) to use to specify who has the permissions to see the content of this tag field. This parameter allows you to restrict access to the content of a tag field to a specific user or group of users. The default value is S-1-1-0 (Everyone).

<TagFieldSIDType>

This optional parameter specifies the tag field security ID (SID) type to use to specify who has the permissions to see the content of this tag field. The default value is Unknown because everyone is the default <TagFieldSIDName> value. Other valid values are User and Group.

<TagFieldSIDSecurityProvider>

This optional parameter specifies the security provider where the security ID (SID) specified in the <TagFieldSIDName> parameter is defined. The default value is Active Directory.

Note: A run will stop if the CESITaggerOutputter plugin encounters a document that cannot be tagged in the Coveo unified index. This can happen when the index becomes in read-only mode. A continuous run will restart at the specified time interval and restart where it left whenever the index is back in read-write mode.

FSDumpMetadataPrinter

The FSDumpMetadataPrinter plugin saves extracted metadata to a specified comma-separated value (CSV) text file. A line is created in the file for each extracted metadata value and is in the following format:

[DocumentID] [DocumentTitle] [MetadataName] [ExtractedMetadataValue]

This outputter is typically useful for pipeline debugging and fine-tuning purposes. You can inspect the file to review the exact output of the pipeline, identifying problems as well as unwanted or missing output values.

Example: With the following outputter definition, the extracted values Theme, Company, and Person metadata are saved to the C:\Temp\Debug-AllMetadata.csv file.

<Outputter>
  <Impl>Coveo.TextAnalytics.Implementations.FSDumpMetadataPrinter, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <DirectoryPath>C:\Temp\</DirectoryPath>
    <Prefix>Debug</Prefix>
    <WantedField>Theme</WantedField>
    <WantedField>Company</WantedField>
    <WantedField>Person</WantedField>
  </Configuration>
</Outputter>  

Available parameters:

<DirectoryPath>

This required parameter specifies the folder where the CSV file is saved.

<Prefix>

This required parameter specifies the prefix of the CSV file name, completed with -AllMetadata.csv.

<WantedField>

At least one instance of this parameter is required to specify metadata for which values will be outputted. Use one instance of this parameter per metadata that you want to output.

FSDumpResultProcessor

The FSDumpResultProcessor plugin creates one text file per fetched document. The files only contain the fetched content of the document, not original or extracted metadata. The files are saved in a specified folder with a name of the form [DocID].txt where [DocID] is the value of the @sysrowid field of the document when the CESIQuerierFetcher fetcher is used. Otherwise, the [DocID] is that set by the fetcher used to retrieve documents. This outputter is typically useful for pipeline debugging to validate the content that is extracted by the fetcher.

Example: With the following outputter definition, the fetched text content of each processed document is saved in the C:\Temp\ folder in a file named after its [DocID].

<Outputter>
  <Impl>Coveo.TextAnalytics.Implementations.FSDumpResultProcessor, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <DirectoryPath>C:\Temp\</DirectoryPath>
 </Configuration>
</Outputter>  
People who viewed this topic also viewed