After a few changes, creating new resource pools (Parent and Child pools) and removing VM memory and CPU reservations, My "Runtime Info" within my Server Cluster went from 17 to 546 "Available Slots". My Workstation Cluster went from 35 to 1683 "Available Slots". That alone doesn't necessarily mean I gained performance, it means I now have room for failover slots and more VM's if I choose to add more. I did run "Performance" monitoring on some of my heavy user VM's and did notice much improvement. Prior to the changes many of my VM workstations were maxing out CPU and Memory resources and now after making those changes none of my VM's have pegged out...I've had one or two hit the warning alarm but none have hit the Critical alarms anymore.
Note:
We run a unique environment in that our VDI cluster demands CPU and Memory resources because of the work that is being performed. Keep in mind that these aren't everyday Office and Web (internet) type users we have. These are Aerospace and Mechanical Engineers running different modeling and simulation programs along with statistical analysis creation and gathering. Yes, these VM's are really being used and pushed to their limits.
Is there any difference between resource pool level and virtual machine level memory reservation?
To keep it short, VM level reservation can be rather evil, it will hoard memory if it has been used by the virtual machine once. Even if the virtual machine becomes idle, the VMkernel will not reclaim this memory and return it to the free memory set. This means that ESX can start swapping and ballooning if no free memory is available for other virtual machines while the owning VM’s aren’t using their claimed reserved memory. It also has influence on the slot size of High availability, for more information about HA slot sizes, please visit the HA deep dive page at yellow-bricks.com. For more information about virtual machine level memory reservation, please read the article “impact of memory management“.
Behavior of resource pool memory reservation
Now setting a memory reservation on a resource pool level has its own weaknesses, but it is much fairer and more along the whole idea of consolidation and sharing than virtual machine memory reservations. RP level reservations are immediately active, but are not claimed. This means it will only subtract the specified amount of memory from the unreserved capacity of the cluster.
RP reservations are used when children of the resource pool uses memory and the system is under contention. Reservations are not wasted and the resources can be used by other virtual machines. Be aware, using and reserving are two distinct concepts! Virtual machines can use the resource, but they cannot reserve this as well if it is already reserved by another item.
It appears that resource pool memory reservations work almost similar to CPU reservations, they won’t let any resource go to waste. And to top it off, resource pool reservations don’t flow to virtual machines, they will not influence HA slot sizes. Which unfortunately can lead to (temporary) performance loss if a host failover occurs. When a virtual machine is restarted by HA they are not restarted in the correct resource pool but in the root resource pool, which can lead to starvation. Until DRS is invoked, the virtual machine need to do it without any memory reservations.
Memory reservation technique
Let’s get back to memory reservation .How does ESX handle memory reservation? Page 17 of the Resource Management Guide states the following:
Memory Reservation
If a virtual machine has a memory reservation but has not yet accessed its full reservation, the unused memory can be reallocated to other virtual machines.
Memory Reservation UsedTo recap the info stated in the Resource Management Guide, when a VM hits its full reservation, ESX will never reclaim that amount of reserved memory even if the machine idles and drops below its guaranteed reservation. It cannot reallocate that machine memory to other virtual machines.
Used for powered‐on virtual machines, the system reserves memory resources according to each virtual machine’s reservation setting and overhead. After a virtual machine has accessed its full reservation, ESX Server allows the virtual machine to retain this much
memory, and will not reclaim it, even if the virtual machine becomes idle and stops accessing memory.
Full reservation
But when will a VM hit its full reservation exactly? Popular belief is that the VM will hit full reservation when a VM is pushing workloads, but that is not entirely true. It also depends on the Guest OS being used by the VM. Linux plays rather well with others, when Linux boots it only addresses the memory pages it needs. This gives ESX the ability to reallocate memory to other machines. After its application or OS generates load, the Linux VM can hit its full reservation. Windows on the other hand zeroes all of its memory during boot, which results in hitting the full reservation during boot time.
Full reservation and admission control
This behavior will have impact on admission control. Admission control on the ESX server checks the amount of available unreserved CPU and memory resources. Because Windows will hit its full reservation at startup, ESX cannot reallocate this memory to other VMs, hereby diminishing the amount of available unreserved memory resources and therefore restricting the capacity of VM placement on the ESX server. But memory reclamation, especially TPS will help in this scenario, TPS (transparent page sharing) reduces redundant multiple guest pages by mapping them to a single machine memory page. Because memory reservation “lives” at machine memory level and not at virtual machine physical level, TPS will reduce the amount of reserved machine memory pages, memory pages that admission controls check when starting a VM.
Transparant Page Sharing
TPS cannot collapse pages immediately when starting a VM in ESX 3.5. TPS is a process in the VMkernel; it runs in the background and searches for redundant pages. Default TPS will have a cycle of 60 minutes (Mem.ShareScanTime) to scan a VM for page sharing opportunities. The speed of TPS mostly depends on the load and specs of the Server. Default TPS will scan 4MB/sec per 1 GHz. (Mem.ShareScanGHz). Slow CPU equals slow TPS process. (But it’s not a secret that a slow CPU will offer less performance that a fast CPU.) TPS defaults can be altered, but it is advised to keep to the default.TPS cannot collapse pages immediately when starting a VM in ESX 3.5. VMware optimized memory management in ESX 4; pages which Windows initially zeroes will be page-shared by TPS immediately.
TPS and large pages
One caveat, TPS will not collapse large pages when the ESX server is not under memory pressure. ESX will back large pages with machine memory, but installs page sharing hints. When memory pressure occurs, the large page will be broken down and TPS can do it’s magic. More info on Large pages and ESX can be found at Yellow Bricks. http://www.yellow-bricks.com/2009/05/31/nehalem-cpu-and-tps-on-vsphere/
Use resource pools
Setting memory reservation has impact on the VM itself and its surroundings. Setting reservation per VM is not best practice; it is advised to create resource pools instead of per VM reservations. Setting reservations on a granular level leads to increased administrative and operational overhead. But when the situation demands to use per VM reservation, in which way can a reservation be set to guarantee as much performance as possible without wasting physical memory and with as less impact as possible. The answer: set reservation equal to the average Guest Memory Usage of the VMs.
Guest Memory Usage
Guest Memory Usage shows the active memory use of the VM. Which memory is considered active memory? If a memory page is accessed in mem.sampleperiod (60sec), it is considered active. To accomplish this you need to monitor each VM, but this is where vCenter comes to the rescue. vCenter logs performance data and does this for a period of time. The problem is that the counters average-, minimum and maximum active memory data is not captured on the default vCenter statistics. vCenter logging level needs to upgraded to a minimum level of 4. After setting the new level, vCenter starts to log the data. Changing the statistic setting can be done by Administration > VirtualCenter Management Server Configuration > Statistics.
To display the average active memory of the VM, open the performance tab of the VM and change chart options, select memory
Select the counters consumed memory and average-, minimum- and maximum active memory. The performance chart of most VMs will show these values close to each other. As a rule the average active memory figure can be used as input for the memory reservation setting, but sometimes the SLA of the VM will determine that it’s better to use the maximum active memory usage.
Consumed memory is the amount of host memory that is being used to back guest memory. The images shows that memory consumed slowly decreases.
The active memory use does not change that much during the monitored 24 hours. By setting the reservation equal to the maximum average active memory value, enough physical pages will be backed to meet the VM’s requests.
No comments:
Post a Comment