Disclaimer:
This article has been updated with the most recent information relevant to VMware vSphere, including features and functionality as of 2025. It provides an overview of ESXi vSphere vSwitch load balancing options, highlighting their pros and cons. While the content is based on the latest version of vSphere at the time of publication, we recommend consulting VMware’s official documentation or release notes for any updates or changes. The material is intended for informational purposes and serves as an introductory guide. If you have suggestions for improving this guide, feel free to share your feedback!
Introduction
Previously, I discussed addressing NIC load balancing issues on an ESXi host and the utility of ESXCLI in that context. Since then, many colleagues have inquired about the differences between various load balancing methods and which one is optimal. Let’s explore and clarify the concepts of network load balancing at the infrastructure level.
For starters, let’s quickly revisit what load balancing is all about. Here’s the deal: don’t mix up load balancing network traffic with balancing workloads for optimal performance; that’s the job of DRS or Distributed Resource Scheduler.
NIC teaming technology in VMware combines two or more physical NICs into a single logical interface to increase the bandwidth of a vSphere virtual switch or a group of ports, thereby enhancing reliability. By configuring the failover procedure, you can choose how exactly traffic will be redirected in case of a failure of one of the NICs. Configuring the load balancing policy allows you to decide how exactly a vSwitch will load balance the traffic between NICs.
So, what’s the takeaway? Load balancing is essentially the technology of uniting physical interfaces into one seamless logical connection. Although aggregation allows increasing channel bandwidth, you shouldn’t really count on perfect load balancing between all interfaces in the aggregated channel. Put simply, this tech is about smartly directing traffic from virtual machines (VMs) to vSwitches and down to pNICs. Whether it’s a vSwitch, pNIC, or a group of vNICs, there are a few tried-and-true methods to balance traffic:
- Route based on originating port ID
- Route based on source MAC hash
- Route based on physical NIC load
- Use explicit failover order
Curious? Let’s dig deeper into each method and break them down in simple terms.
Route Based on Originating Virtual Port ID
This method is the default option for both standard and distributed vSwitches. It assigns an uplink based on the virtual port ID of the VM’s vNIC. Each VM is connected to a specific virtual port on the vSwitch, and the vSwitch maps this virtual port to a specific physical NIC (pNIC). This method ensures that one vNIC uses only one pNIC at any given time – simple and straightforward.
Here’s how it works: each VM gets a unique identifier on the vSwitch. In order to assign an uplink port for a VM, vSwitch uses a similar port identifier on a network card or a group of network cards. When an uplink port is assigned, vSwitch distributes traffic for a VM through the same uplink port as long as this VM works on that switch.
The virtual switch assigns the uplink port only once, a port identifier for a VM is fixed, so if vSwitch assigns a different group of ports to the VM, it generates a new uplink port.
However, things don’t always stay static — a VM could be migrated, powered off, or even deleted. When that happens, its port identifier on vSwitch becomes available once again. Furthermore, vSwitch stops sending traffic to this port, which, in turn, lowers overall traffic distributed to the uplink port connected with it. However, if the VM is turned on or transferred, it may appear on another port and start using another uplink port.
If all pNICs in the group are active, they distribute traffic for a VM.
Now, let’s add a practical touch. Turn off VM 2 and VM 5 and then power on in the following order: VM 8, VM 9, VM 2, and VM 5. Guess what happens? You’ll see that the port identifier on Port Group 1 and Port Group 2 didn’t lose connection with pNIC uplink ports. In turn, VM 8 and VM 9 were connected to the uplink ports previously used by VM 2 and VM 5. It’s like musical chairs but for VMs and uplink ports!
Pros:
- Simple physical switch configuration: no need for uplink binding (EtherChannel); only independent ports of the switch require configuration, keeping things simple and manageable.
- Equal distribution of bandwidth: when the number of vNICs exceeds the number of pNICs, this method ensures that each vNIC gets its fair share of bandwidth.
- Physical NIC redundancy: even if all pNICs are in active use, when one pNIC fails, the other pNICs in the team continue to balance traffic, ensuring your network stays up and running.
- Traffic balancing across multiple switches: physical NIC group traffic can be distributed between several physical switches, avoiding hardware failure and improving overall reliability.
- Beacon probing for failover detection: this load balancing type may use a network failover detection mechanism called beacon probing, enhancing the stability of your network environment.
- Load balancing in multi-VM environments: in environments with several VMs, the load is distributed across all active network cards, increasing overall performance.
Cons:
- Limited Bandwidth per vNIC: A single vNIC cannot use the combined bandwidth of multiple pNICs. For example, if there are four pNICs in a group (1 Gb/s each), a VM with one vNIC can only utilize 1 Gb/s bandwidth through one pNIC.
- Not suitable for high client request volumes: this method isn’t ideal for virtual servers that handle a lot of requests from different clients when there’s a necessity to load balance traffic of one VM (with one vNIC) between several pNICs;
- No support for 802.3ad aggregation: this method doesn’t support 802.3ad channel aggregation technology and may cause issues with accessing IP storage (e.g., iSCSI, NFS) since VMkernel can also use only one pNIC to work with different iSCSI targets.
Route Based on IP Hash
This load balancing method distributes traffic by creating a hash (a fixed-size value) derived from the source IP address and the destination IP packet. This clever hashing mechanism ensures that traffic between a single VM and multiple clients, including through a router, can be balanced with different vmNICs. To enable this functionality, you’ll need to activate 802.3ad support on the physical switch connected to your ESXi Server.
Among load balancing algorithms, IP hash is a star performer when it comes to efficiency. However, with great power comes complexity. The server shoulders a significant computational load since it calculates the hash for every IP packet. The hash calculation relies on the XOR algorithm and uses this formula:
1 |
<em>((LSB (SrcIP) xor LSB (DestIP)) mod (# pNICs)</em> |
The load balancing equitability largely depends on the number of TCP/IP sessions between a host and different clients, also pNIC. When many connections are in play, this method ensures more even traffic distribution and avoids pitfalls inherent to the option based on ID-port.
However, there’s a catch: if your host connects to multiple physical switches, you’ll need to aggregate all ports into a stack (EtherChannel). Without support for this mode on your physical switches, IP hash won’t be an option. In such situations, you might find yourself connecting all pNICs in the vSwitch to one physical switch.
Here’s where you need to tread carefully. Relying on a single switch means introducing a single point of failure – if the switch goes down, the entire system follows suit. Think of it in advance.
Another critical detail: while applying IP hash as a load balancing algorithm, you’ll need to perform configuration on vSwitch, and you don’t have to override it on the ports group level. In other words, ALL devices connected to vSwitch with IP hash load balancing should use IP hash load balancing.
IP hash works best when there is a significant number of destination IP addresses in play. Otherwise, you’re risking encountering a situation when two or more requests instead of balancing will try to load the same pNIC.
For instance, consider a scenario where a VM uses an iSCSI-connected disk from two SANs. If these 2 SANs have IP addresses that can be calculated with the same module value (look at the tab), then all traffic will load one pNIC, which, in turn, lowers the efficiency of using IP hash load balancing to a minimum.
VM |
IP VM |
DestrIP |
XOR (SrcIP, DestIP) |
Modul |
pNIC |
VM 1 |
x.x.x.10 |
z.z.z.20 |
(10 xor 20) = 30 mod 2 = 0 |
0 |
1 |
VM 1 |
x.x.x.10 |
z.z.z.30 |
(10 xor 30) = 20 mod 2 = 1 |
0 |
1 |
This approach works well when there’s a large number of destination IP addresses, but be mindful of the limitations when the distribution of IPs is not as diverse.
Pros:
- Improved performance for multi-VM communication: when a VM communicates with multiple other VMs, it can theoretically utilize a bandwidth greater than what a single pNIC supports.
- Physical NIC redundancy: if a pNIC or uplink fails, the remaining NICs in the group will continue balancing traffic, ensuring uninterrupted network performance. However, synchronization is key – the ESXi host and physical switch must recognize the channel as inactive so that the uplink could work properly. If there is any inconsistency, traffic won’t be able to switch to the other pNICs in the group.
Cons:
- Less flexible switch configuration: physical switch configuration demands that ports be set up for EtherChannel static connections, which limits adaptability. Additionally, many switches don’t support EtherChannel across multiple physical switches, confining the pNIC group to a single switch.
Note: exceptions exist, such as specific stacks or modular switches which can actually do that on several switches or modules. Technologies like Cisco vPC (Virtual Port Channel) can address this issue, provided the switches support it. Talk to your vendor to get more information.
- Lacks beacon probing: this load balancing option lacks beacon probing for error detection. Instead, it relies solely on uplink port failure notifications, which may not provide as comprehensive a failover mechanism.
Route Based on Source MAC Hash
Now, let’s talk about a simpler yet equally intriguing load balancing method: Route Based on Source MAC Hash. This approach suggests that vSwitch selects an uplink port for a VM based on the MAC address of the VM. To calculate an uplink port for a VM, vSwitch applies LSB (Least Significant Bit) of the source MAC-address (vNIC MAC-address) according to the module of the number of active pNICs in vSwitch to receive an address in the pNIC array.
Let’s break it down with an example: consider a setup with two pNICs and a vNIC with the MAC address 00:15:5D:99:96:0B. The LSB of the MAC address is 0x0B or 11 in decimal. In modulo operation, you split (using integer division) LSB MAC in the amount of pNIC (11 / 3), and pick the remainder (in this case, 2) as the modulus of the operation. The physical NIC array is based on 0, which means that 0=pNIC 1, 1=pNIC 2, 2=pNIC 3.
name |
MAC LSB |
modul |
pNIC |
VM 1 |
:39 = 57 |
0 |
1 |
VM 2 |
:6D = 109 |
1 |
2 |
VM 3 |
:0E = 14 |
2 |
3 |
VM 4 |
:5A = 90 |
0 |
1 |
VM 5 |
:97 = 151 |
1 |
2 |
VM 6 |
:F5 = 245 |
2 |
3 |
VM 7 |
:A2 = 162 |
0 |
1 |
Pros:
- More balanced load distribution: compared to the “Route Based on Originating Port ID” method, this approach ensures a more equitable load balancing as the vSwitch calculates an uplink port for each packet, ensuring better traffic distribution.
- Consistent uplink port assignment: all VMs use the same uplink port because their MAC addresses are static, meaning that powering a VM on or off doesn’t disrupt its uplink port assignment.
- No physical switch changes needed: this method eliminates the need for any configuration adjustments on physical switches, simplifying deployment and reducing setup time.
Cons:
- Bandwidth limited by uplink port speed: the speed of the uplink port connected to a specific port identifier determines the bandwidth available to the VM unless the VM utilizes multiple vNICs with different MAC addresses.
- Higher resource consumption: this method is more resource-intensive than the routing based on originating port ID, as the vSwitch must calculate the uplink port for each packet.
- Potential uplink port overload: the virtual switch does not monitor the current load of uplink ports, increasing the risk of some ports becoming overloaded while others remain underutilized.
Route Based on Physical NIC Load
This load balancing method is exclusive to distributed switches, and while it may seem similar to the routing based on originating port ID, it brings some notable differences to the table. The primary distinction lies in how the pNIC for traffic balancing is selected. Instead of a static assignment, the choice is dynamically determined based on the current load on the pNIC.
The system evaluates the load on each pNIC every 30 seconds. If the load on a specific pNIC exceeds 75%, the VM port identifier with the highest I/O operations switches to another uplink port of a less-loaded pNIC. Unlike other load-balancing methods where the port remains fixed once assigned, this approach adapts to changing traffic conditions.
In simpler terms, this method isn’t traditional load balancing. It’s more like a smart failover scenario, redirecting traffic to the least busy uplink port from the list of active pNICs whenever necessary.
Pros:
- Low resource consumption: the distributed switch calculates the uplink port for the VM only once, and periodic uplink checks minimally impact performance;
- Efficient load redistribution: the distributed switch actively monitors the uplink port load and shifts traffic to maintain balance where possible.
- No physical switch configuration required: this method works seamlessly without needing adjustments on the physical network side.
Cons:
- Bandwidth constraints: the available bandwidth for a VM is determined solely by the uplink port connected to the distributed switch.
Use Explicit Failover Order
This policy takes a more straightforward approach, although it might come as a surprise to some – it essentially eliminates true load balancing. Here’s how it works: the vSwitch always selects the highest-priority uplink port from the list of available active NICs. If the first uplink port becomes unavailable, traffic shifts to the next one in the list, and so on.
The failover order parameter is key here, defining the Active/Standby pNIC mode for the vSwitch. While simple, this method sacrifices the flexibility and efficiency of dynamic load balancing in favor of a more rigid, predictable behavior.
Comparison of Load Balancing Policies
Method |
Pros |
Cons |
Route Based on Originating Virtual Port ID |
Simplicity, Even Distribution, Redundancy, Multiple Switches |
Bandwidth Limitation, Not Ideal for High Traffic VMs, No 802.3ad Support |
Route Based on IP Hash |
Enhanced Performance, Redundancy |
Complex Configuration, No Beacon Probing, Potential Imbalance |
Route Based on Source MAC Hash |
Improved Distribution, Consistent Assignment, No Physical Switch Configuration Needed |
Bandwidth Limitation, Resource Intensive |
Route Based on Physical NIC Load |
Dynamic Load Distribution, Automatic Adjustment |
vSphere Distributed Switch Requirement, Potential for Frequent Reassignments |
Use Explicit Failover Order |
Predictability, Simplicity |
No Load Balancing, Manual Configuration |
Conclusion
Each load balancing policy comes with its own set of advantages and drawbacks, and the best choice depends entirely on your specific needs. If you’re new to this topic, starting with the originating port ID method (the default option) is a good idea – it’s simple, effective, and a great introduction to how load balancing works.
As your understanding grows, you can experiment with other methods to find the one that aligns best with your workload and infrastructure requirements.
I hope this explanation helps clarify these load balancing methods for you. If you’re eager to dive deeper, VMware’s official guides are an excellent next step. And if you have suggestions or ideas for improving this material, feel free to share them – I’m all ears!
from StarWind Blog https://ift.tt/nfxbkL3
via
IFTTT