How Memory Ballooning in VMware works



Memory Ballooning

Memory ballooning is handled through a driver (vmmemctl.sys) that is installed as part of the VMware Tools. This driver is loaded in the guest OS to interact with the VMkernel and is leveraged to reclaim memory pages when ESX memory resources are in demand and available physical pages cannot meet requirements. When memory demands rise on the ESX host, the VMkernel will instruct the balloon driver to "inflate" and consume memory in the running guest OS, forcing the guest operating system to leverage its own native memory management techniques to handle changing conditions. Free pages are typically released first, but the guest OS may decide to page some memory out to its pagefile on the virtual disk. The reclaimed memory is then used by ESX to satisfy memory demands of other running workloads, but will be relinquished back to the guest OS when memory demands decrease by "deflating" the balloon driver. Balloon driver activity can be viewed either through VirtualCenter performance monitoring graphs or ESXTOP on the local host.

Ballooning

Ballooning is a completely different memory reclamation technique compared to page sharing. Before describing the technique,it is helpful to review why the hypervisor needs to reclaim memory from virtual machines. Due to the virtual machine’s isolation, the guest operating system is not aware that it is running inside a virtual machine and is not aware of the states of other virtual machines on the same host. When the hypervisor runs multiple virtual machines and the total amount of the free host memory becomes low, none of the virtual machines will free guest physical memory because the guest operating system cannot detect the host’s memory shortage. Ballooning makes the guest operating system aware of the low memory status of the host.

In ESX, a balloon driver is loaded into the guest operating system as a pseudo-device driver. It has no external interfaces to the guest operating system and communicates with the hypervisor through a private channel. The balloon driver polls the hypervisor to obtain a target balloon size. If the hypervisor needs to reclaim virtual machine memory, it sets a proper target balloon size for the balloon driver, making it “inflate” by allocating guest physical pages within the virtual machine. Figure 6 illustrates the process of the balloon inflating.
In Figure 6 (a), four guest physical pages are mapped in the host physical memory. Two of the pages are used by the guest application and the other two pages (marked by stars) are in the guest operating system free list. Note that since the hypervisor cannot identify the two pages in the guest free list, it cannot reclaim the host physical pages that are backing them. Assuming the hypervisor needs to reclaim two pages from the virtual machine, it will set the target balloon size to two pages. After obtaining the target balloon size, the balloon driver allocates two guest physical pages inside the virtual machine and pins them, as shown in Figure 6 (b). Here, “pinning” is achieved through the guest operating system interface, which ensures that the pinned pages cannot be paged out to disk under any circumstances. Once the memory is allocated, the balloon driver notifies the hypervisor the page numbers of the pinned guest physical memory so that the hypervisor can reclaim the host physical pages that are backing them. In Figure 6 (b) , dashed arrows point at these pages. The hypervisor can safely reclaim this host physical memory because neither the balloon driver nor the guest operating system relies on the contents of these pages. This means that no processes in the virtual machine will intentionally access those pages to read/write any values. Thus, the hypervisor does not need to allocate host physical memory to store the page contents. If any of these pages are re-accessed by the virtual machine for some reason, the hypervisor will treat it as normal virtual machine memory allocation and allocate a new host physical page for the virtual machine. When the hypervisor decides to deflate the balloon — by setting a smaller target balloon size — the balloon driver deallocates the pinned guest physical memory, which
releases it for the guest’s applications.



Figure 6: Inflating the balloon in a virtual machine ESX

Typically, the hypervisor inflates the virtual machine balloon when it is under memory pressure. By inflating the balloon, a virtual machine consumes less physical memory on the host, but more physical memory inside the guest. As a result, the hypervisor offloads some of its memory overload to the guest operating system while slightly loading the virtual machine. That is, the hypervisor transfers the memory pressure from the host to the virtual machine. Ballooning induces guest memory pressure. In response, the balloon driver allocates and pins guest physical memory. The guest operating system determines if it needs to page out guest physical memory to satisfy the balloon driver’s allocation requests. If the virtual machine has plenty of free guest physical memory, inflating the balloon will induce no paging and will not impact guest performance. In this case, as illustrated in Figure 6, the balloon driver allocates the free guest physical memory from the guest free list. Hence, guest-level paging is not necessary. However, if the guest is already under memory pressure, the guest operating system decides which guest physical pages to be paged out to the virtual swap device in order to satisfy the balloon driver’s allocation requests. The genius of ballooning is that it allows the guest operating system to intelligently make the hard decision about which pages to be paged out without the hypervisor’s involvement.
For ballooning to work as intended, the guest operating system must install and enable the balloon driver. The guest operating system must have sufficient virtual swap space configured for guest paging to be possible. Ballooning might not reclaim memory quickly enough to satisfy host memory demands. In addition, the upper bound of the target balloon size may be imposed by various
guest operating system limitations.


The vmmemctl Driver

When you install VMware Tools into a VM, along with an improved network and mousedriver, a memory control driver is installed as well. Its file name is vmmemctl, but it is normally referred to by VMware as the “balloon driver,” because it uses the analogy of a balloon to explain how this driver works.
The most important thing to know about vmmemctl is that it is engaged only when
memory is scarce and VMs are fighting over that resource—in other words, when
contention is occurring. By itself, vmmemctl doesn’t fix the problem of a lack of resources or unexpected peak demands for memory. In fact, its biggest use is as an indicator that there is a memory problem, so it is useful as a counter that draws the administrator’s attention to potential problems.

NOTE Of course, one way of reducing the chance of running out of memory is to buy more physical resources. But remember this important fact: ESX delivers the memory to the VM only as it needs it, on an on-demand basis. The whole point of virtualization is to use resources efficiently and effectively, and escape the whole “let’s just throw more resources (money) at the problem” approach that got us in such a mess in the first place.

How does the vmmemctl driver work? Well, during normal operation where memory is plentiful and VMs are not in contention, the driver does nothing. It sits there inside the VM, deflated like a saggy balloon at the end of the party. However, when memory is scarce and the VMs are fighting over the resource, the vmmemctl driver begins to inflate. In other words, it begins to make demands for pages of memory. This generally occurs in a VM that you have marked as having a low priority on the system. The guest OS obeys its internal memory management techniques, freeing up RAM by flushing old data to its virtual memory (page file or swap partition) to give the vmmemctl driver ranges of memory. Next comes the clever bit—rather than hanging on to this newly allocated memory, the vmmemctl driver hands over its memory to the VMkernel. The VMkernel in turn hands over this memory to the other VMs that really need it. When memory demands return to normal and are no longer scarce, the balloon driver deflates and gracefully hands back the memory it claimed to the guest OS.

Again, on its own, the vmmemctl driver doesn’t fix the problem, which is a lack of
memory. It does, on the other hand, give us a clear indicator of a potential problem. As with TPS, the Resource Allocation tab of a VM and the esxtop utility (set to show memory statistics) will show ballooning activity if it is occurring.
Additionally, vmmemctl allows us to configure  the system for worst-case scenarios by offering guaranteed levels of service to VMs that need it if memory becomes low