Text Analytics Pipeline Configuration
Text analytics pipelines are registered in the Coveo Job scheduler (CJS) service using TAnGO (see Managing Text Analytics Pipeline Configurations). The CJS service manages the launch of a pipeline once or at specified regular intervals.
A run is a text analytics pipeline that sequentially applies a set of stages on a set of documents.
Example: Typically, a set of documents is fetched from the Coveo unified index, text analytics metadata is extracted, and the metadata is injected back in the index in the form of tag fields.
The pipeline is composed of stages of the following types in the following order:
A run always starts with a fetcher plugin that retrieves documents to be processed by the pipeline (see Predefined Text Analytics Fetcher). There can be only one fetcher plugin in a pipeline.
The pipeline can contain one or more filter plugins used to exclude specific type of content from the fetched documents, before they are processed further (see Predefined Text Analytics Filters).
At the center of the text analytics process, one or more extractor plugins create and attach metadata to processed documents (see Predefined Text Analytics Extractors).
One or more normalizer plugins clean-up the metadata created by the extractors (see Predefined Text Analytics Normalizers).
At the end of the pipeline, an outputter plugin saves the text analytics results somewhere. There can be only one outputter plugin in a pipeline (see Predefined Text Analytics Outputters).
The pipeline structure for a run is shown in the following XML configuration file sample.
<?xml version="1.0" encoding="utf-8"?> <TextAnalyticsService> <!-- Global configuration parameters --> <Configuration> ... </Configuration> <!-- Definition of the run --> <!-- Set the name of your run --> <Run Name="MainRun"> <!-- Plugin used to fetch the documents to process --> <Fetcher> ... </Fetcher> <!-- Extract stuff --> <Extractors> <!-- First extractor --> <Extractor>...</Extractor> ... <!-- Nth extractor --> <Extractor>...</Extractor> </Extractors> <!-- Normalize metadata names and values --> <Normalizers> <!-- First normalizer --> <Normalizer>...</Normalizer> ... <!-- Nth normalizer --> <Normalizer>...</Normalizer> </Normalizers> <!-- Plugin used to output the result of the text analytics run --> <Outputter> ... </Outputter> </Run> </TextAnalyticsService>
A job is a one stage pipeline that you can use when you need to execute general tasks that should not be executed on each individual document.
Example: You can use a job when you want to use CES tagging queries, copy files, programmatically change a configuration in CES, perform a maintenance task, etc.
Review the procedure to create, run, and fine-tune text analytics pipelines (see Managing Text Analytics Pipeline Configurations).