Product DocsMenu

Predefined Text Analytics Normalizers

You can use normalizer plugins in post-extraction stages to clean up metadata by either replacing or eliminating values to produce a more homogeneous set of metadata values.

MetadataBlackLister

The MetadataBlackLister plugin removes from a specified metadata, values that are defined in a blacklist in the form of a flat text file, one blacklist value per line.

Example: With the following normalizer definition, values from the blacklist.txt file found with the same capitalization in the metadata Theme are removed from the metadata.

<Normalizer>
  <Impl>Coveo.TextAnalytics.Implementations.MetadataBlackLister, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <FilePath>D:\TextAnalytics\Config\BlackLists\blacklist.txt</FilePath>
    <CaseSentitive>True</CaseSensitive>
    <TypeRestriction>Theme</TypeRestriction>
  </Configuration>
</Normalizer>  

MetadataRegexBlacklister

The MetadataRegexBlacklister plugin removes from a specified metadata, values that match at least one of the specified regular expressions. This normalizer is generic and powerful, but can require significant CPU resources when specifying complex regular expressions.

Example: With the following normalizer definition, sequences of numerical characters and strings starting with ' are removed from the Theme and Place metadata.

<Normalizer>
  <Impl>Coveo.TextAnalytics.Implementations.MetadataRegexBlacklister, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <CaseSensitive>False</CaseSensitive>
    <TypeRestriction>Theme</TypeRestriction>
    <TypeRestriction>Place</TypeRestriction>
    <Regex>^[0-9]+$</Regex>
    <Regex>^'.*$</Regex>
  </Configuration>
</Normalizer>

MetadataFilter

When you use the SalienceMetadataExtractor entity discovery plugin, it extracts named entities for all categories it knows (see SalienceMetadataExtractor). You can use the MetadataFilter plugin to eliminate metadata for specific unwanted named entity categories.

Examples:

With the following normalizer definition, only the Person, Company, and Place metadata are kept.

<Normalizer>
  <Impl>Coveo.TextAnalytics.Implementations.MetadataFilter, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <InverseMode>True</InverseMode>
    <FilteredName>Company</FilteredName>
    <FilteredName>Person</FilteredName>
    <FilteredName>Place</FilteredName>
  </Configuration>
</Normalizer>

With the following normalizer definition, only the Person metadata are removed, all other named entity metadata are kept.

<Normalizer>
  <Impl>Coveo.TextAnalytics.Implementations.MetadataFilter, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <InverseMode>False</InverseMode>
    <FilteredName>Person</FilteredName>
  </Configuration>
</Normalizer>

MetadataNormalizer

The MetadataNormalizer plugin loads one specified text file or all text files found in the specified folder. The text files must contain tab separated expressions. The first column contains a unique expression to look for. The second column contains the replacement expression, or expressions separated by a semi-colon (;). You can restrict the normalization to one or more metadata names, otherwise the metadata value normalization from all files applies to all metadata.

Tip: In a pipeline, you can use more than one MetadataNormalizer instance, typically each applying the content of one normalization file to one metadata.

Example: With the following normalizer definition, the plugin loads the normalization expression pairs from the D:\TXTAN\Config\Normalizations\PeopleNameNormalization.txt file.

<Normalizer>
  <Impl>Coveo.TextAnalytics.Implementations.MetadataNormalizer, Coveo.TextAnalytics.Implementations</Impl>
  <Configuration>
    <FilePath>D:\TXTAN\Config\Normalizations\PeopleNameNormalization.txt</FilePath>
    <CaseSensitive>False</CaseSensitive>
    <TypeRestriction>People</TypeRestriction>
  </Configuration>
</Normalizer>  

When the file contains the following lines:

RKennedy	Robert F. Kennedy
R. Kennedy	Robert F. Kennedy
Bob Kennedy	Robert F. Kennedy
B. Obama	Barack Obama;President

When found in the People metadata, Robert Kennedy's specified name variants are replaced by Robert F. Kennedy. When B. Obama is found, it is replaced by two values: Barack Obama and President.

People who viewed this topic also viewed