Product DocsMenu

Coveo Platform 7.0 >
Administrator Help > Coveo Enterprise Search 7.0 > OCR Module > Associating the OCR Open Converter to Document Types

Associating the OCR Open Converter to Document Types

After adding an open converter to CES, you must associate this converter to appropriate document types in a document type set. A typical configuration includes Adobe Acrobat documents (PDF) and image formats.

To associate the OCR open converter to document types

  1. On the Coveo server, access the Administration Tool (see Opening the Administration Tool).

  2. Select Index > Sources and Collections.

  3. In the Sources and Collections page, select the source on which the OCR open converter script will be used. The Status page is displayed.

  4. In the navigation panel on the left, click Document Types.

  5. In the Document Types page, click Add.

  6. In the Add Document Type Set page that appears:

    1. In the Name box, enter a name of your choice for the document type set to be used for indexing using the OCR module.

      Example: OCR Document Types

    2. In the Description box, optionally enter a description of the usage of the document type set.

    3. Click Save.

  7. Back in the Document Types page:

    1. In the Document Type Set drop-down list, ensure that the document type set you just created is selected.

    2. Click Edit.

  8. In the page that appears, for each document type to be indexed using the OCR module:

    1. Click the document type.

      Example: Click Adobe Acrobat Documents.

      Note: The document formats supported by the OCR module are: .tiff, .tiff-fx, .pcx, .dcx, .bmp, .jpeg, .png, .max, .gif, .pbm, and .pdf.

    2. In the page that appears for this document type:

      1. In the Action drop-down list, select Index entire document.

      2. In the Converter section, select Use an open converter, and then select the name of the converter that you created for this purpose (see Adding an OCR Open Converter).

      3. Click Apply Changes.

  9. Open the CES Console (see Using the CES Console).

  10. Back in the Administration Tool, rebuild the sources that use the new document type set.

  11. In the CES Console, follow the rebuild activities. Documents are crawled, converted, and transactions are applied to index.

    The end-users can search for the OCR indexed document content once transactions are applied to the index.

People who viewed this topic also viewed