Product DocsMenu

Virtualized Server Guidelines

Coveo products can operate on virtual machines as they do on real hardware. As for real hardware, the key to a successful Coveo deployment on virtualized hardware is to respect the Coveo Platform requirements for the size of your index (see Coveo Platform Hardware and Software Requirements). Virtualized environments can vary greatly from one implementation to another so it is not appropriate to state specific virtual hardware requirements.

The nature of virtualized environments to optimize the usage of real hardware resources by sharing them among several virtual servers goes against the ideal setup where required resources are dedicated to a server. Like many processes, Coveo processes such as indexing documents or serving queries have varying server resource loads over time (see Relation Between CES Features and Hardware Resources).

You or some of your colleagues are experts on your hypervisor and virtual environment. You have the responsibility to ensure that your Coveo virtual server implementation maximizes the chances that required resources will be available when they are needed.

Guidelines

  • Dedicate a virtual machine (VM) respecting the Coveo Platform requirements for your index size for each Coveo server (see Coveo Platform Hardware and Software Requirements and Coveo Platform Deployment Overview).

  • Minimize overcommitment of CPU and memory resources on a host where a virtual Coveo server is running.

  • For large indexes, when your virtual environments does not allow you to create a virtual server that respects the requirements (see Index From 40 to 80 Million Documents), consider the following options:

    • Use geographically distributed indexing (GDI) to split the index in two of more CES instances, each on its dedicated virtual server (see About Geographically Distributed Indexing).

    • Commission a dedicated hardware Coveo server meeting the requirements.

  • Disk management

    Many Coveo processes are disk I/O intensive. The Coveo server requirements specify using separate dedicated disks for specific Coveo server process categories (operating system and programs, index, other Coveo files, and near-real time indexing) to optimize disk I/O performances and minimize interferences. In a virtual environment, the pool of available storage resources are not only shared among various processes of one server, but also with processes from several other servers.

    Performance issues with virtual Coveo servers are often linked to poor disk performances.

    Example: The Coveo server VM shares a disk resource with other host VMs, including a VM on which a large repository is hosted. The disk resource is able to respond to the average traffic.

    However, when the Coveo server indexes the large repository, the disk resource throughput quickly reaches its limit because both the Coveo and the repository servers make significantly more disk I/Os, respectively to index the content, and to respond to the Coveo crawler. The performance of both systems (and any other host VMs sharing the same disk resource) drops significantly while indexing takes place.

    You are the expert with your hypervisor and virtual environment:

    • Avoid sharing the same storage resources between a Coveo server and a repository that is indexed by Coveo.

    • Preferably use disk resources from a low latency storage area network SAN.

    • Attach available virtual disk resources (such as logical unit number [LUN] storage) that best match the requirements for your index size.

  • Distribute Coveo intensive process schedules in time when shared resources are most available (see Administration Tool - Schedules Menu) .

    • When you index more than one repository, avoid starting all source refreshes at the same time. Define source schedules that distribute source refreshes in time during off-peak hours (see Creating or Modifying a Source Schedule and Scheduling Source Refresh Actions) .

      Example: The default Every day source schedule starts every day at midnight. If all your daily refreshed sources use this schedule, they all start at midnight, potentially overloading your shared resources.

      Rather create source specific (like Repository1 daily, Repository2 daily,...) or time specific (like Daily at 2:00 AM, Weekdays at 3:00 AM, Saturdays at 9:00 PM...) source schedules that you can assign to your sources to distribute the processes over the off-peak period.

    • Avoid scheduling Coveo intensive processes at the same time as other intensive processes (such as backups) from other systems.

People who viewed this topic also viewed