Thursday, January 16, 2025

Improve Server Consolidation with VMware vSphere 8 U3 Memory Tiering Feature

 

With the price increase of VMware Licenses, you might want to get more out of your existing hardware and your existing virtual infrastructure. By upgrading to the latest vSphere and ESXi 8.0 U3 you’ll unlock an interesting feature that we have recently blogged about – Exploring the “Memory Tiering over NVMe” Feature in vSphere 8.0 Update 3.

While some of you are already thinking (or leaving) the VMware ecosystem for some other alternative, most of you might still be in the phase of reflection or thinking what to do next.

Adding more DRAM is possible, but it’s more costly. In fact, the total cost of the DRAM is 50 to 80 percent of the cost of the server. By adding NVMe to the equation, you saving a lot of money while increasing the server’s RAM capacity and consolidation ratio.

The testing setup VMware has used

VMware has done testing with industry standard tools. Login VSI tool for VDI testing, HammerDB for SQL Server testing and DVD Store Benchmark. They tested ESXi 8 U3 with Memory Tiering feature activated and configured it to use 1 to 1 DRAM to NVMe ratio. What this means is that for every gigabyte of DRAM, you get one gigabyte of NVMe. That’s what is 1 is to 1 NVMe ratio.

As you can see here, your total system available memory is 1 terabyte, and 512 gigabytes is coming from DRAM, which is a faster tier, and 512 is coming from your NVMe tier. Simple, right?

Behind the scenes, ESX divides that memory into two tiers, Tier 1 and Tier 2. Tier 1 memory being DRAM, which is a faster memory, Tier 2 being cheaper and bigger memory, and that’s typically slower devices like NVMe SSDs and CXL memory devices.

Screenshot from VMware below.

1Tb of DRAM composed from 512Gb of RAM and 512 Gb of NVMe SSD

1Tb of DRAM composed from 512Gb of RAM and 512 Gb of NVMe SSD

How the system works?

Inside the system, there is a very intelligent page classification scheme that is completely new. It figures out which memory pages are hot and which of the memory pages are cold in a specific time window.

For example, in the past one minute or so, how many pages are accessed more frequently? So, the most frequently accessed pages within a given time period will be classified as hot pages, and obviously you want hot pages to go into a faster tier, which is DRAM, and the colder pages will go on the slow tier. So that’s one difference. Once the VMs are running, you could see that in your typical environments, the workload characteristics would change.

VMware intelligent page classification mechanism in Memory Tiering Feature

VMware intelligent page classification mechanism in Memory Tiering Feature

 

The system continues monitoring and keep on changing this classification, and it keeps on running in a feedback loop. Your system is going through phase changes, you’re running database transactions, sometimes you’re not running anything. This system will automatically adjust all these tier sizes, so you can really optimize your memory usage.

What VMware does with memory tiering is that they unlock this additional memory using NVMe as the second tier. Now if you look at their just average VM sizes and average host sizes and what’s their consumption and activeness look like, and again, their activeness is pretty low.

If your applications are memory bound, you will hit this memory capacity first and you leave some of the cores which are idle.

Performance Testing VMware

The VDI – The first result is the baseline, which is only using DRAM, 1 terabyte of memory. So, you get an end user score of about 8.5, which is an excellent score. Now, if you enable memory tearing, you can run twice as many VMs.

At the beginning, you are able to run 120 VMs in the baseline without performance degradation. And when you enable memory tearing, you can run 240 VMs. The end user score is not degraded.

You’ll get 2x increase in VM density with less than 3% performance loss.

You’ll get 2x increase in VM density with less than 3% performance loss.

 

The SQL Server – The SQL server test showed 2x Increase in VM density with less than 10% in performance loss.

SQL server Density increase compare graph

SQL server Density increase compare graph

 

Oracle DB – The Oracle DB test showed that 2x in VM density with less than 5% performance loss.

Oracle density increase by using DVD Store Benchmark

Oracle density increase by using DVD Store Benchmark

 

You can watch the video where Todd Muirhead talks with Qasim Ali, both from VMware. They show all the graphs, performance degradation (very small compared to benefits) and the possibilities.

They also show where you can monitor the memory consumption and activeness.

Note: You don’t have to use 1:1 ratio. In my lab, I’ve tested other scenarios where I allocated more NVMe SSD capacity than the DRAM.

However, the recommendation from VMware is to not configure more than you have DRAM. But hey, this is a lab.

With only 16Gb of RAM we now added 30 Gb of NNMe SSD tier that is also used as a RAM giving us 46 Gb of overall RAM capacity on our ESXi hosts that were normally running only with 16 Gb of RAM. You can perfectly test this within VMware Workstation nested lab.

Look at this screenshot.

Tier 0 and Tier 1 from my nested lab

Tier 0 and Tier 1 from my nested lab

Final Words

Memory Tiering from vSphere 8 U3 (VCF 9) is a true game changing where you are able to increase the server density by adding some NVMe SSDs to each of the hosts within your cluster, while keeping excellent performance. The configuration in the ESXi U3 needs some CLI which I documented in my previous blog post, however I assume that the next version of vSphere will have a new UI for that particular feature.

You can see that by increasing your server density a lot you only have slight increase on the CPU overhead. In the video they were talking a worst-case scenario was 10%, where usual performance hit at the CPU level was 2-5%.

Memory tiering helps to reduce TCO costs, optimizes memory usage, and of course, improves VM consolidation (up to 2x with 1:1 ratio across different workloads and applications).

This is extremely encouraging for existing infrastructure running VMware environments. This feature can probably give you some additional time for reflection, to see whether you’ll continue with VMware or start transitioning to another hypervisor platform.

The reason for transition is almost always cost driven. On the other hand, you must weight all factors, including time for V2V, reconfiguration, trainings etc. The transition might not start before the beginning or during next year’s period.



from StarWind Blog https://ift.tt/3nUV5SD
via IFTTT

No comments:

Post a Comment