Saturday 17 November 2012

CPU Virtualization - Basics


Now it’s the time to go into CPU Virtualization. I have to mention here that ESXi 5.x was used as the testing environment for the concepts mentioned in this and the next posts.
 
Since ESXi is NUMA aware OS, the way how CPU virtualization works differ based on the underlying system (NUMA or non-NUMA).

For non-NUMA SMP

In this type, ESXi CPU Scheduler will split the load from all VMs across all cores or HT logical processors in round robin manner (which similar to other Operating Systems such as Windows in the way of distributing threads across core as we mentioned in the previous posts).  Keeping in mind that each core or HT logical processing is serving one VM at a time. What does this mean?

Assuming that we have an SMP system with 4 cores running on top of it 2 VMs each having 1 vCPU, in this case ESXi CPU scheduler will split the load of both VMs across the 4 cores in round robin manner (Thread01 (VM1 + VM2) on cores 0 & 1, Thread02 (VM1 + VM2) on cores 2&3, Thread03 (VM1 + VM2) on cores 1&3, etc).

Below is a test done by loading the vCPU of MS-AD VM. From ESXTOP we can see that the %PCPU is hunting in a round robin manner.
In case you have 8  VMs each with 1 vCPU running on top of this system, the scheduler will split the load of those VMs across the 4 cores. Since each core will serve one VM at a time, you will find %RDY counter increased on all VMs. %RDY counter describes the amount of time a VM is waiting for a free core to handle it based on CPU scheduler. During this waiting time, the VM will be freeze.

Let's consider another example of vSMP VM over non-NUMA SMP System

Assume that we have 2 VMs with the 1vCPU and 2vCPU configurations running on top 4 cores SMP system. We said previously that the load from all VMs will be spread across all cores. Does this mean that VM1 will use core-01 while VM2 will use core-02?

In fact No! ESXi CPU scheduler will split vSMP VM across multiple cores based on the number of vCPUs. In our example VM01 will use core-01 while VM02 will use cores 02 & 03. Then round robin hunting will start (VM01 to core-02 + VM02 to cores-03 & 04, etc). This is to provide more cycles to the VM as expected by adding extra vCPUs. On the other hand, this has a major drawback that you need to have two free cores  (or HT logical processors) simultaneously to execute VM02.Till then the VM will be freeze, i.e. higher %RDY count. Therefore, adding extra vCPUs isn't always an advantage and needs to be addressed properly.

Note: Also, keep in mind that extra vCPUs needs to be consider from DRS and HA point of view to make sure that other hosts in the cluster are having the same CPU capacity.

Below is a test done by loading a VM named VM-01 which has 2vCPUs. We can see that two %PCPU are loaded at a time with round robin hunting. Note that the total amount of %USED is almost 200% which the sum of usage of 2vCPUs (you may refer to ESXTOP Bible to understand %USED more and see how its exactly calculated).
Last in this section, I thought to add this example. Two VMs named MS-AD (1vCPU) and VM-01 (4vCPU) are loaded at the same time on top of 4 cores system. Let's look at ESXTOP output.
If you think that VM-01 (%USED) is 400% and MS-AD (%USED) is 100%, then you are wrong!!

The total number of vCPUs is 5 while the total number of physical cores is 4. This means that CPU scheduler will give one core to MS-AD (during this time VM-01 freeze. Look at its %RDY). Next CPU scheduler will give 4 cores to VM-01 (during this time MS-AD freeze. Look at its %RDY). We just said up the drawback of extra vCPUs is that you need to have the same number of physical cores free simultaneously to run the VM.

Note: From the guest OS, you will find that CPU utilization is 100% for both VMs since the guest is utilizating the max given CPU cycles.