Thursday 5 July 2012

VMware Fault Tolerance


The feature works by creating a secondary VM on another ESX host that shares the same virtual disk file as the primary VM and then transferring the CPU and virtual device inputs from the primary VM (record) to the secondary VM (replay) via a FT logging NIC so it is in sync with the primary and ready to take over in case of a failure. While both the primary and secondary VMs receive the same inputs, only the primary VM produces output such as disk writes and network transmits. The secondary VM’s output is suppressed by the hypervisor and is not on the network until it becomes a primary VM, so essentially both VMs function as a single VM.
FT relies on LockStep technology in CPUs to replicate all instructions from one host to another for one VM (VMware called it vLockStep).

FT cluster level requirements (Common)

  1. HA must be enabled
  2. At least two ESXi hosts in the cluster should run same FT version
  3. Host certificate checking must be enabled
  1. EVC should be enabled to use FT with DRS, else DRS will be disabled

FT ESXi host requirements (Common)

  1. Shared datastore
  1. CPU must be FT compatible
  1. FT license are required
  1. Hardware Virtualization
  2. FT Logging should be enabled on VMkernal Port

FT VM requirements (Common)

  1. Only VMs with one vCPU are supported by FT
  2. VM Guest OS must be supported by FT
  1. Shared storage
  1. VMDK should be thick provisioned eager zeroed or Virtual Mode RDM
  2. VM mustn't have snapshots

Limitations (Common)

  • vMotion can be used but no SVM
  • Hot-add feature isn't supported
  • Power capping at BOIS level should be disabled
  • Backup technologies relaying on snapshots can't be used

 Note: You can use VMware site survey tool to verify the possibility of running FT

To enable FT navigate to Inventory > VMs and Templates > Right-Click VM > Fault Tolerance > Turn On Fault Tolerance.

Note: Some CPUs require powering off the VM to enable FT

Once FT is enabled, a pop window will warn that following changes will take effect:

  1. Virtual hard disk will be changed to Thick Provisioned Eager Zeroed.
  1. DRS will be disabled if EVC isn't enabled
  1. VM will reserved full memory allocated

Fault Tolerance with HA

In case the primary host failed, HA won't restart the primary VM since the secondary one took over automatically.

In case the hosts of primary and secondary FT VMs are down, HA will restart the primary VM on a third host in the cluster and a secondary VM is recreated later on another host in the cluster.

Don't forget that HA is prerequisite for FT

Similarly, assuming that HA VM monitoring is enabled, in case guest OS fails on primary VM, FT won't trigger any action since the secondary VM guest OS will fail as well (full sync between VMs). In this case, HA will detect guest OS failure and restart the primary VM. Once completed a secondary VM is recreated.

No comments:

Post a Comment