Wednesday, August 21, 2024

StarWind Virtual SAN (VSAN) vs Microsoft Storage Spaces Direct (S2D): Hyper-V HCI Performance Comparison

 

Introduction

Choosing the right data storage solution can make or break your system’s performance and reliability. If you’re working in a Hyper-V environment, you’ve probably heard of StarWind Virtual SAN (VSAN) and Microsoft Storage Spaces Direct (S2D). But which one should you go for? In this article, we dive deep into these two solutions, comparing their performance, capacity efficiency, and practical application in a 2-node Hyper-V cluster setup. By the end, you’ll have a clearer picture of which solution might be your perfect match.

To compare these two solutions fairly, we set up a 2-node Hyperconverged Infrastructure (HCI) Hyper-V cluster under two different configurations:

  • StarWind VSAN NVMe-oF over TCP

  • Host Mirroring + MDRAID-5.
  • Microsoft Storage Spaces Direct over TCP

  • Mirror-accelerated parity, workload placed in the mirror tier.
  • Mirror-accelerated parity, workload placed in both tiers – mirror and parity.

Solution diagram:

StarWind VSAN for Hyper-V NVMe-oF over TCP scenario:

StarWind VSAN for Hyper-V NVMe-oF over TCP scenario

In this setup, each Hyper-V node is equipped with 5x NVMe drives passed through to the StarWind Controller Virtual Machine (CVM). Inside the CVM, the drives are assembled into an MDRAID5 array. On top of this array, two StarWind High Availability (HA) devices are created, ensuring data replication and continuous availability. StarWind NVMe-oF Initiator, chosen due to the lack of a native Microsoft NVMe-oF initiator (Microsoft is expected to introduce support for NVMe-oF in Windows Server 2025 but with TCP support only), connects these devices to the nodes. Cluster Shared Volumes (CSVs) are then created on these connected devices.

Microsoft Storage Spaces Direct over TCP scenario – Mirror-accelerated parity:

Microsoft Storage Spaces Direct over TCP scenario - Mirror-accelerated parity

This scenario tested two configurations of S2D, focusing on a mirror-accelerated parity workload, providing the optimal balance between performance and capacity efficiency:

  1. Workload placed in the mirror tier: Maximizes performance by keeping data in the faster mirror tier.
  2. Workload placed in both tiers: Simulates a more balanced scenario where data moves between the mirror and parity tiers, reflecting real-world conditions (when the workload does not fit in the mirror tier and Resilient File System (ReFS) begins to move data to the parity tier). We also tried to achieve a behavior where writes were sent directly to the parity tier – the worst-case scenario.

In reality, with production workloads, the performance will likely fall somewhere between these two cases.

Two storage tiers are created with different resiliency settings – Mirror for performance and Parity for capacity – with the following parameters:

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedPerformance -ResiliencySettingName Mirror -MediaType SSD -NumberOfDataCopies 4

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedCapacity -ResiliencySettingName Parity -MediaType SSD -NumberOfDataCopies 2 -PhysicalDiskRedundancy 1 -NumberOfGroups 1 -FaultDomainAwareness StorageScaleUnit -ColumnIsolation PhysicalDisk -NumberOfColumns 4

Volumes are allocated with 20% in the mirror tier and 80% in the parity tier, adhering to Microsoft’s recommendations:

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume01 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume02 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

ReFS manages the data movement between these tiers to optimize performance. The threshold value, at which ReFS starts moving data between the tiers, was left at the default – 85%.

Capacity efficiency:

Capacity efficiency is a big deal when evaluating storage solutions:

  • StarWind VSAN for Hyper-V NVMe-oF
    Achieves a capacity efficiency of 40%, thanks to its combination of host mirroring and MDRAID-5.
  • Microsoft S2D Mirror-Accelerated Parity
    Delivers a capacity efficiency of 35.7% (20% mirror, 80% parity), though this can vary depending on the percentage of the volume allocated to the mirror tier. For more details on how to calculate capacity efficiency for mirror-accelerated parity, please refer to the provided link.

Microsoft recommends leaving some capacity in the storage pool unallocated to give volumes space to repair “in-place” after drive failure. If sufficient capacity exists, an immediate, in-place, parallel repair can restore volumes to full resiliency even before the failed drives are replaced. This happens automatically. So, in our setup, the recommended reserve space is 5.82 TB (20% of the total pool size):

Capacity | NVMe storage in "s2d-pool"

When planning your solution, consider these factors to ensure you get the best performance and efficiency for your needs.

Testbed overview:

When it comes to evaluating storage solutions like StarWind VSAN for Hyper-V NVMe-oF over TCP and Microsoft S2D over TCP, we didn’t cut any corners. Our testbed was robust and meticulously configured to simulate real-world environments, ensuring our findings are relevant and reliable. Here’s a breakdown of the hardware and software setups that powered our tests:

Hardware:

Server model Supermicro SYS-220U-TNR
CPU Intel(R) Xeon(R) Platinum 8352Y @2.2GHz
Sockets 2
Cores/Threads 64/128
RAM 256GB
NICs 2x Mellanox ConnectX®-6 EN 200GbE (MCX613106A-VDA)
Storage 5x NVMe Micron 7450 MAX: U.3 3.2TB

Software:

Windows Server Windows Server 2022 Datacenter 21H2 OS build 20348.2527
StarWind VSAN Version V8 (build 15469, CVM 20240530) (kernel – 5.15.0-113-generic)
StarWind NVMe-oF Initiator StarWind NVMe-oF Initiator.2.0.0.672(rev 674).Setup.486

StarWind CVM parameters:

CPU 24 vCPU
RAM 32GB
NICs 1x network adapter for management
4x network adapter for client IO and synchronization
Storage MDRAID5 (5x NVMe Micron 7450 MAX: U.3 3.2TB)

Testing methodology:

The benchmarks were conducted using the FIO utility in the client/server mode. We configured a total of 20 virtual machines (VMs), with 10 VMs hosted on each server node. Each VM was allocated 4 vCPUs, 8GB of RAM, and three RAW virtual disks connected to separate SCSI controllers.

Test Scenarios:

  • Microsoft Storage Spaces Direct (S2D)

  • Mirror-accelerated parity (Mirror-only): For scenarios where the workload is placed entirely in the mirror tier, each virtual disk size was 10GB.
  • Mirror-accelerated parity (Both tiers): For scenarios utilizing both the mirror and parity tiers, each virtual disk size was 100GB.
  • StarWind VSAN NVMe-oF

  • For all tests, each virtual disk size was 100GB.

Data Patterns Tested:

  • 4k random read
  • 4k random read/write (70/30)
  • 4k random write
  • 64k random read
  • 64k random write
  • 1M read
  • 1M write

Pre-Test Warm-Up:

Before running specific tests, we filled virtual disks with random data and warmed up them using corresponding patterns to ensure stable performance:

  • 4k random read/write (70/30) and 4k random write: VM disks were warmed up with a 4k random write pattern for 4 hours.
  • 64k random write: VM disks were warmed up with a 64k random write pattern for 2 hours.

Test Execution:

  • Duration: Read tests were conducted for 600 seconds, and write tests lasted 1800 seconds.
  • Repetition: All tests were repeated three times, and the average value was used as the final result.

Specific Configurations:

  • Microsoft Storage Spaces Direct (S2D)
    Following Microsoft’s recommendations for the S2D scenario, test VMs were placed on the CSV owner node to avoid redirecting requests to another node, ensuring local data reads without using the network stack and providing less network utilization on writes. Each VHDX file was placed in different subdirectories to optimize ReFS metadata operations and reduce latency.
  • StarWind VSAN for Hyper-V NVMe-oF
    VMs were evenly distributed across hosts without being pinned to the node that owns the volume. Each VHDX file was placed in different subdirectories to maintain consistent performance.

Benchmarking local NVMe performance:

Before diving into the full evaluation, we checked if the NVMe drives lived up to their vendor’s promises, so we ran a series of tests to see if its performance matched up. Here is the image with vendor-claimed performance:

Here is the image with vendor-claimed performance

 

Using the FIO utility in client/server mode, we checked how well the NVMe SSDs in our server performed in a local storage setup. Our local storage tests used different patterns to see how the NVMe SSDs handled different kinds of data. The following results have been achieved:

1x NVMe Micron 7450 MAX: U.3 3.2TB
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 6 32 997,000 3,894 0.192
4k random read/write 70/30 6 16 531,000 2,073 0.142
4k random write 4 4 385,000 1,505 0.041
64k random read 8 8 92,900 5,807 0.688
64k random write 2 1 27,600 1,724 0.072
1M read 1 8 6,663 6,663 1.200
1M write 1 2 5,134 5,134 0.389

Our tests showed that the NVMe drives lived up to what the vendor promised. Whether handling small 4k reads or large 1M writes, they delivered on speed and consistency.

Benchmark results in a table:

The benchmarking results are presented in tables to illustrate performance metrics such as IOPS, throughput (MiB/s), latency (ms), and CPU usage. An additional metric, “IOPS per 1% CPU usage,” highlights the performance dependency on the CPU usage for 4k random read/write patterns. This parameter is calculated using the following formula:

IOPS per 1% CPU usage = IOPS / Node count / Node CPU usage

Where:

  • IOPS represents the number of I/O operations per second for each pattern.
  • Node count is 2 nodes in our case.
  • Node CPU usage denotes the CPU usage of one node during the test.

By incorporating this additional metric, we aimed to provide deeper insights into how CPU usage correlates with IOPS, offering a more nuanced understanding of performance characteristics.

Now let’s delve into the detailed benchmark results for each storage configuration.

StarWind VSAN for Hyper-V NVMe-oF over TCP scenario

The table illustrates StarWind VSAN’s performance under various workload patterns and configurations. For 4k random reads, IOPS scored from 420,000 at lower queue depths to 881,000 at higher depths. In a mixed 4k random read/write (70/30) test, it achieves up to 561,000 IOPS, showcasing its prowess in handling mixed workloads.

In the 64k and 1M read/write patterns, the StarWind VSAN NVMe-oF reaches up to 15,2 GB/s, demonstrating its ability to handle such workloads effectively.

VM count Pattern Numjobs IOdepth IOPs MiB/s Latency (ms) Node CPU usage % IOPs per 1% CPU usage
20 4k random read 3 4 420,000 1,641 0.570 45.00% 4,667
4k random read 3 8 307,000 1,200 1.515 35.00% 4,386
4k random read 3 16 546,000 2,134 1.736 50.00% 5,460
4k random read 3 32 741,000 2,895 2.586 57.00% 6,500
4k random read 3 64 836,000 3,265 4.567 58.00% 7,207
4k random read 3 128 881,000 3,442 8.827 60.00% 7,342
4k random read/write (70%/30%) 3 2 241,500 943 0.582 39.00% 3,096
4k random read/write (70%/30%) 3 4 334,000 1,305 0.843 45.00% 3,711
4k random read/write (70%/30%) 3 8 301,200 1,177 1.683 42.00% 3,586
4k random read/write (70%/30%) 3 16 416,000 1,625 2.507 48.00% 4,333
4k random read/write (70%/30%) 3 32 534,000 2,086 4.002 53.00% 5,038
4k random read/write (70%/30%) 3 64 561,000 2,191 7.768 52.00% 5,394
4k random write 3 2 139,000 541 0.859 33.00% 2,106
4k random write 3 4 192,000 751 1.246 39.00% 2,462
4k random write 3 8 238,000 928 2.018 44.00% 2,705
4k random write 3 16 260,000 1,015 3.689 44.00% 2,955
4k random write 3 32 167,000 653 11.476 27.00% 3,093
64k random read 3 2 160,000 10,000 0.749 35.00%
64k random read 3 4 200,000 12,500 1.205 39.00%
64k random read 3 8 210,000 13,125 2.299 40.00%
64k random read 3 16 228,000 14,250 4.203 41.00%
64k random read 3 32 233,000 14,562 8.343 41.00%
64k random write 3 1 44,000 2,751 1.350 25.00%
64k random write 3 2 51,900 3,242 2.311 27.00%
64k random write 3 4 58,300 3,645 4.108 28.00%
64k random write 3 8 62,400 3,900 7.689 29.00%
64k random write 3 16 63,600 3,975 12.070 29.00%
64k random write 3 32 63,800 3,987 30.150 29.00%
1024k read 1 1 10,000 10,000 1.998 26.00%
1024k read 1 2 12,400 12,400 3.225 29.00%
1024k read 1 4 14,100 14,100 5.668 31.00%
1024k read 1 8 15,200 15,200 10.574 32.00%
1024k read 1 16 15,600 15,600 20.625 33.00%
1024k write 1 1 3,443 3,443 5.804 24.00%
1024k write 1 2 3,903 3,903 10.241 25.00%
1024k write 1 4 4,086 4,086 19.561 25.00%
1024k write 1 8 4,156 4,156 38.492 25.00%

Overall, StarWind VSAN shows great performance at 4k random read/write patterns, consistent read and write performance regardless of VM location, and impressive capacity efficiency at 40%.

Microsoft Storage Spaces Direct over TCP scenario (Mirror-accelerated parity: Mirror-only)

The next table presents S2D’s performance with a mirror-accelerated parity configuration, focusing on workloads in the mirror tier.

For 4k random read scenarios, IOPS peak at 2,653,000, showcasing exceptional read performance due to local reading. In the 4k random read/write (70/30) pattern, results reach up to 654,000 IOPS.

The 64k random read/write and 1M read/write tests maintain high throughput, with 53,5 GB/s for 64k reads and 52,4GB/s for 1M reads. S2D shows exceptional read performance when VMs are on the volume-owning node, and robust write performance within the mirror tier. However, read performance declines if local reading requirements aren’t met and unusual performance drops occur at certain queue depths. Additionally, write performance can drop if VMs are running on a node that is not the volume owner, necessitating careful monitoring.

VM count Pattern Numjobs IOdepth IOPs MiB/s Latency (ms) Node CPU usage % IOPs per 1% CPU usage
20 4k random read 3 4 833,000 3,256 0.286 27.00% 15,426
4k random read 3 8 752,000 2,937 0.648 21.00% 17,905
4k random read 3 16 1,083,000 4,230 0.884 29.00% 18,672
4k random read 3 32 1,646,000 6,429 1.165 41.00% 20,073
4k random read 3 64 2,344,000 9,158 1.637 54.00% 21,704
4k random read 3 128 2,653,000 10,363 2.897 67.00% 19,799
4k random read/write (70%/30%) 3 2 324,300 1,266 0.382 20.00% 8,108
4k random read/write (70%/30%) 3 4 114,600 447 2.103 7.00% 8,186
4k random read/write (70%/30%) 3 8 62,800 245 7.659 4.00% 7,850
4k random read/write (70%/30%) 3 16 509,000 1,988 1.939 25.00% 10,180
4k random read/write (70%/30%) 3 32 614,000 2,398 3.564 31.00% 9,903
4k random read/write (70%/30%) 3 64 654,000 2,554 6.899 34.00% 9,618
4k random write 3 2 80,300 314 1.499 9.00% 4,461
4k random write 3 4 46,800 183 5.116 6.00% 3,900
4k random write 3 8 34,800 136 13.788 4.00% 4,350
4k random write 3 16 64,700 253 14.876 7.00% 4,621
4k random write 3 32 186,000 728 10.174 18.00% 5,167
64k random read 3 2 317,000 19,812 0.376 17.00%
64k random read 3 4 498,000 31,125 0.478 25.00%
64k random read 3 8 424,000 26,500 1.142 22.00%
64k random read 3 16 623,000 38,937 1.539 27.00%
64k random read 3 32 856,000 53,500 2.243 38.00%
64k random write 3 1 85,700 5,355 0.693 14.00%
64k random write 3 2 58,300 3,645 2.055 10.00%
64k random write 3 4 32,300 2,019 7.435 5.00%
64k random write 3 8 23,300 1,457 20.592 4.00%
64k random write 3 16 41,800 2,616 22.939 6.00%
64k random write 3 32 86,300 5,393 22.138 14.00%
1024k read 1 1 19,900 19,900 1.002 5.00%
1024k read 1 2 31,600 31,600 1.267 7.00%
1024k read 1 4 43,700 43,700 1.825 11.00%
1024k read 1 8 50,300 50,300 3.180 14.00%
1024k read 1 16 52,400 52,400 6.098 16.00%
1024k write 1 1 8,290 8,290 2.400 8.00%
1024k write 1 2 8,693 8,693 4.614 9.00%
1024k write 1 4 8,607 8,607 9.290 9.00%
1024k write 1 8 8,559 8,559 18.684 9.00%

Microsoft Storage Spaces Direct over TCP scenario (Mirror-accelerated parity: Both Tiers)

The performance metrics for the dual-tier configuration in S2D highlight workload management across both mirror and parity tiers.

In 4k random read patterns, IOPS reach up to 2,500,000, showcasing excellent scalability. The 4k random read/write (70/30) pattern results show up to 247,000 IOPS.

For 64k random read/write and 1M read/write tests, the system maintains strong throughput, with 52,000 MiB/s for 64k reads and 50,100 MiB/s for 1M reads, demonstrating S2D’s robust capability to handle complex data operations across tiers. However, its write performance drops when workloads exceed the mirror tier.

VM count Pattern Numjobs IOdepth IOPs MiB/s Latency (ms) Node CPU usage % IOPs per 1% CPU usage
20 4k random read 3 4 814,000 3,179 0.293 27.00% 15,074
4k random read 3 8 739,000 2,886 0.642 26.00% 14,212
4k random read 3 16 1,003,000 3,918 0.956 29.00% 17,293
4k random read 3 32 1,556,000 6,078 1.232 42.00% 18,524
4k random read 3 64 2,190,000 8,554 1.749 55.00% 19,909
4k random read 3 128 2,500,000 9,766 3.068 68.00% 18,382
4k random read/write (70%/30%) 3 2 126,200 492 1.245 27.00% 2,337
4k random read/write (70%/30%) 3 4 108,200 422 2.442 23.00% 2,352
4k random read/write (70%/30%) 3 8 49,800 195 9.766 10.00% 2,490
4k random read/write (70%/30%) 3 16 225,800 882 5.690 37.00% 3,051
4k random read/write (70%/30%) 3 32 247,000 965 11.251 38.00% 3,250
4k random read/write (70%/30%) 3 64 231,400 903 25.634 33.00% 3,506
4k random write 3 2 51,400 201 2.324 24.00% 1,071
4k random write 3 4 58,600 229 4.094 26.00% 1,127
4k random write 3 8 58,600 229 8.170 26.00% 1,127
4k random write 3 16 74,800 292 12.868 29.00% 1,290
4k random write 3 32 74,900 293 25.659 29.00% 1,291
64k random read 3 2 316,000 19,750 0.378 18.00%
64k random read 3 4 488,000 30,500 0.490 26.00%
64k random read 3 8 377,000 23,560 1.296 22.00%
64k random read 3 16 601,000 37,562 1.596 27.00%
64k random read 3 32 832,000 52,000 2.307 38.00%
64k random write 3 1 14,700 919 4.078 13.00%
64k random write 3 2 15,200 950 7.883 17.00%
64k random write 3 4 14,400 900 16.656 17.00%
64k random write 3 8 14,600 913 32.938 17.00%
64k random write 3 16 14,700 919 65.238 18.00%
64k random write 3 32 14,400 900 132.694 18.00%
1024k read 1 1 19,900 19,900 1.002 5.00%
1024k read 1 2 31,600 31,600 1.230 8.00%
1024k read 1 4 42,400 42,400 1.882 11.00%
1024k read 1 8 47,600 47,600 3.363 13.00%
1024k read 1 16 50,100 50,100 6.379 16.00%
1024k write 1 1 1,482 1,482 13.496 4.00%
1024k write 1 2 1,573 1,573 25.448 4.00%
1024k write 1 4 2,295 2,295 34.817 5.00%
1024k write 1 8 2,187 2,187 73.178 5.00%

Overall, S2D shows exceptional performance in both test cases, however the storage capacity efficiency is about 35.7% and could be even less if additional space is assigned for in-place repairs.

Benchmarking results in graphs:

With all benchmarks completed and data collected, we can now compare the results using graphical charts for a clearer understanding.

4k random read:

Figure 1: 4K RR (IOPS)

 

Figure 1: 4K RR (IOPS)

 

Let’s start with the 4K random read test, where Figure 1 showcases the performance in IOPS. The S2D in the mirror-accelerated parity (workload in mirror tier) configuration reaches a remarkable 833,000 IOPS at 4 IO depth, scaling up to 2,653,000 IOPS at 128 IO depth.

Comparatively, the StarWind VSAN NVMe-oF HA scenario peaks at 881,000 IOPS at 128 IO depth. Here, S2D outshines StarWind VSAN with approximately 200% more IOPS at higher depths.

So, what’s the magic behind S2D’s performance boost? It’s all about local reading. In a cluster shared volume (CSV) setup, S2D leverages the SMB 3.0 protocol to allow multiple hosts to access and perform I/O operations on a shared volume (if you want to explore this topic in more detail, please read here or check this article). If a VM is running on the node that owns the volume, it can read data directly from the local disk, bypassing the network stack. This local read path minimizes latency and maximizes performance, leading to impressive IOPS numbers.

However, there’s a catch. This local reading perk only works if the VM is on the volume-owning node. If not, the read operations have to go through the network to the owning node, which can slow things down. To keep things running smoothly, you need to keep an eye on where your VMs are running and move them to the appropriate nodes as necessary.

Figure 2: 4K RR (Latency)

 

Figure 2: 4K RR (Latency)

 

When it comes to latency, Figure 2 reveals that S2D in the mirror-accelerated parity (workload in mirror tier) also excels, starting at a low 0.286 ms at 4 IO depth and increasing to 2.897 ms at 128 IO depth. StarWind VSAN NVMe-oF begins at 0.570 ms, reaching 8.827 ms at the same depth.

This shows that S2D offers up to 67% lower latency. S2D in the mirror-accelerated parity (workload in both tiers) also maintains better latency, starting at 0.293 ms and peaking at 3.068 ms. The latency advantage of S2D is again attributed to local reads.

Figure 3: 4K RR (IOPS per 1% CPU Usage)

 

Figure 3: 4K RR (IOPS per 1% CPU Usage)

 

Switching gears to efficiency, Figure 3 compares IOPS per 1% CPU usage during a 4k random read test. S2D in the mirror-accelerated parity (workload in mirror tier) proves highly efficient, delivering up to 21,704 IOPS per 1% CPU usage at 64 IO depth, whereas StarWind VSAN NVMe-oF peaks at 7,342 IOPS per 1% CPU usage at 128 IO depth.

This indicates that S2D is about 196% more efficient in this regard. The mirror-accelerated parity (workload in both tiers) configuration of S2D also proves more efficient, reaching 19,909 IOPS per 1% CPU usage at 64 IO depth.

4k random read/write 70/30:

Figure 4: 4K RR/RW 70%/30% (IOPS)

 

Figure 4: 4K RR/RW 70%/30% (IOPS)

 

Next, let’s dive into mixed 70/30 read-write patterns. Figure 4 is key for understanding real-world performance because pure read or write workloads are rare in actual production.

Figure 4 shows the number of IOPS during the mixed 70%/30% 4k random read/write tests with Numjobs = 3.

Interestingly, with Storage Spaces Direct, there’s a noticeable drop in performance at queue depths 4 and 8. This performance drop is not observed in StarWind VSAN NVMe-oF HA tests. StarWind VSAN maintains consistent performance, hitting 334,000 IOPS at queue depth 4 and 301,200 IOPS at queue depth 8.

With workload in the mirror tier, S2D’s performance drops to 114,600 IOPS at queue depth 4 and 62,800 IOPS at queue depth 8. This translates to a reduction of approximately 65.7% and 79.1%, respectively, compared to the StarWind VSAN NVMe-oF HA scenario.

The mixed workload in S2D suffers because ReFS moves new data from the mirror tier to the parity tier, impacting performance. It shows 126,200 IOPS at queue depth 2 to 247,000 at queue depth 32. Meanwhile, StarWind VSAN NVMe-oF hits 241,500 IOPS at queue depth 2 and 534,000 IOPS at queue depth 32, showcasing StarWind’s superiority with IOPS figures 91.4% higher at queue depth 2 and 116.2% higher at queue depth 32 compared to S2D’s mirror-accelerated parity (workload in both tiers) configuration.

Figure 5: 4K RR/RW 70%/30% (Latency)

 

Figure 5: 4K RR/RW 70%/30% (Latency)

 

Figure 5 examines latency for the mixed 4K random 70/30 workload. S2D in the mirror-accelerated parity mode (workload in mirror tier) starts at 0.382 ms at 2 IO depth, reaching 6.899 ms at 64 IO depth. StarWind VSAN NVMe-oF, on the other hand, starts at 0.582 ms and goes up to 7.768 ms.

Figure 6: 4K RR/RW 70%/30% (IOPS per 1% CPU Usage)

 

Figure 6: 4K RR/RW 70%/30% (IOPS per 1% CPU Usage)

 

Figure 6 explores the number of IOPS relative to 1% CPU utilization during the same mixed workload.

Storage Spaces Direct with workload within the mirror tier provides up to 10,180 IOPS per 1% CPU usage at 16 IO depth, while StarWind VSAN NVMe-oF peaks at 5,394 IOPS per 1% CPU usage. This makes S2D about 89% more efficient.

When workload is touching both tiers, S2D achieves a maximum of 3,506 IOPS per 1% CPU usage, demonstrating 69.7% less performance compared to the StarWind VSAN NVMe-oF HA scenario at 64 IO depth.

4k random write:

Figure 7: 4K RW (IOPS)

 

Figure 7: 4K RW (IOPS)

 

The ability to maintain consistent write performance across various queue depths is crucial for demanding virtualization environments. Figure 7 shows the amount of IOPS during 4k random write operations.

StarWind VSAN stands out with consistent performance across most queue depths, significantly outperforming S2D (workload in mirror tier) from IO depth 2 to 16.

With S2D and workload in the mirror tier, there’s a big drop in performance at queue depths 4 and 8. At queue depth 4, Storage Spaces Direct score about 46,800 IOPS which is more than 4 times lower than StarWind VSAN’s 192,000 IOPS figure. At queue depth 8, the gap widens even more and StarWind VSAN ends up being 584% more effective.

Interestingly, at queue depth 32 StarWind VSAN loses the advantage scoring 167,000 IOPS, while S2D with data in mirror tier gets traction achieving 186,000 IOPS. That’s being said, when workload is hitting both tiers, S2D is unable to show better performance figures and ends up scoring 74,900 IOPS. Figure 8: 4K RW (Latency)

 

Figure 8: 4K RW (Latency)

 

Latency for 4K random writes is also a critical factor. Since latency corresponds to prior IOPS results, the overall picture remains consistent in Figure 8.

StarWind VSAN NVMe-oF demonstrates the lowest latency, starting at 0.859 ms with a 2 IO depth and increasing to 3.689 ms at a 16 IO depth. Virtual SAN significantly outperforms Storage Spaces Direct with workload in mirror tier, which starts at 1.499 ms (43% higher latency) and rises to 14.876 ms (75% higher latency) in QD=16 test.

When comparing StarWind to S2D with workload within both tiers, the performance gap is even more pronounced, with StarWind showing 63% lower latency at 2 IO depth (0.859 ms vs. 2.324 ms) and 55% lower latency at 32 IO depth (11.476 ms vs. 25.659 ms). Only at 32 IO depth, StarWind VSAN NVMe-oF demonstrates slightly higher latency, reaching 11.476 ms compared to S2D’s 10.174 ms, which is about 11% lower.

Figure 9: 4K RW (IOPS per 1% CPU Usage)

 

Figure 9: 4K RW (IOPS per 1% CPU Usage)

 

Efficiency in 4K random write workloads is measured in IOPS per 1% CPU usage, as shown in Figure 9.

Storage Spaces Direct with workload in mirror tier achieves up to 5,167 IOPS per 1% CPU usage, while StarWind VSAN NVMe-oF peaks at 3,093 IOPS per 1% CPU usage, making S2D approximately 67% more efficient. However, when the workload utilizes both S2D tiers, efficiency drops significantly, with a maximum of only 1,291 IOPS per 1% CPU usage, making it the least efficient of the three scenarios.

64k random read:

Figure 10: 64K RR (Throughput)

 

Figure 10: 64K RR (Throughput)

 

Moving to larger data blocks, Figure 10 illustrates the throughput performance for 64K random reads.

Storage Spaces Direct with workload in mirror tier significantly outpaces the StarWind VSAN NVMe-oF HA scenario, achieving a peak of 53,500 MiB/s at 32 IO depth compared to StarWind’s 14,562 MiB/s. This indicates that S2D delivers approximately 267% more throughput.

When the workload utilizes both S2D tiers, it shows slightly lower throughput but still surpasses StarWind VSAN significantly. The higher performance in S2D is attributed to local reads, but remember, this efficiency is conditional on the VM running on the node that owns the volume. StarWind VSAN, in contrast, provides stable performance regardless of VM placement, eliminating the need for additional monitoring and VM binding.

Figure 11: 64K RR (Latency)

 

Figure 11: 64K RR (Latency)

 

Figure 11 shows the latency for 64K random reads. The results align with the throughput data discussed earlier.

Here, S2D with workload in the mirror tier maintains low latency due to local reads, starting at 0.376 ms and reaching 2.243 ms at 32 IO depth. The StarWind VSAN NVMe-oF scenario starts higher at 0.749 ms and peaks at 8.343 ms, which is up to 73% higher latency than S2D.

Figure 12: 64K RR (CPU Usage)

 

Figure 12: 64K RR (CPU Usage)

 

In Figure 12, we examine CPU usage during 64K random reads.

Storage Spaces Ditrect with workload in the mirror tier starts at 17% CPU usage at 2 IO depth and peaks at 38% at 32 IO depth. When the workload hits both tiers, S2D shows consistent CPU usage trends, closely following the S2D “mirror-only” test results.

The StarWind VSAN NVMe-oF scenario begins significantly higher at 35% and peaks at 41%, slightly above S2D. This indicates that S2D is more efficient in CPU usage, with StarWind VSAN using approximately 106% more CPU at IO depths 2 to 16 and about 8% more at IOdepth=32.

64k random write:

Figure 13: 64K RW (Throughput)

 

Figure 13: 64K RW (Throughput)

 

Figure 13 illustrates the 64K random write throughput, highlighting performance differences across three scenarios.

Storage Spaces Direct with workload in the mirror tier exhibits erratic performance, with notable drops at medium IO depths. For example, throughput falls to 2,019 MiB/s at a 4 IO depth, dips further to 1,457 MiB/s at 8 IO depth, and then rebounds to 2,616 MiB/s at 16 IO depth. This pattern mirrors the behavior observed in 4K random write tests.

In contrast, StarWind VSAN delivers more consistent performance, surpassing S2D by 79.6% at a 4 IO depth, by 167.6% at 8 IO depth, and by 52% at 16 IO depth.

When workloads span both tiers, S2D shows significantly lower throughput across the board. StarWind VSAN outperforms S2D by 199% at a 1 IO depth, with the performance gap widening to 343% at a 32 IO depth.

This highlights StarWind’s capability to handle write operations with consistently high performance across varying IO depths.

Figure 13: 64K RW (Throughput)

 

Figure 14: 64K RW (Latency)

 

Figure 14 displays the latency for 64K random writes, showing a similar trend.

StarWind VSAN delivers faster response times at lower IO depths (4, 8, and 16), but S2D (with workloads in the mirror tier) takes the lead at a 32 IO depth, achieving a lower latency of 22.138 ms compared to StarWind’s 30.150 ms.

When compared to S2D’s configuration with workloads spread across both tiers, StarWind VSAN is significantly more efficient, providing 66.9% faster response times at a 1 IO depth and 77.3% lower latency at a 32 IO depth.

Figure 13: 64K RW (Throughput)

 

Figure 15: 64K RW (CPU usage)

 

Figure 15 highlights CPU usage during 64K random writes.

StarWind VSAN consistently shows higher CPU utilization compared to both Storage Spaces Direct configurations. At a 1 IO depth, StarWind uses 25% CPU, which is 79% higher than S2D’s 14% with the workload in the mirror tier and 92% higher than S2D’s 13% in the mixed-tier test. This trend persists across different IO depths, with StarWind maintaining higher CPU usage but delivering more consistent performance under varying workloads.

1M read:

 Figure 13: 64K RW (Throughput)

 

Figure 16: 1024K R (Throughput)

 

Figure 16 presents the throughput results for 1024K reads, where S2D with workload in the mirror tier significantly outperforms StarWind VSAN, reaching 52,000 MiB/s at a 16 IO depth compared to StarWind’s 15,600 MiB/s — about 233% higher throughput.

Even when workloads are spread across both tiers, S2D continues to outperform StarWind VSAN by a substantial margin. This impressive read performance from S2D is again due to local reads when the test VM is located on the CSV owner node.

Figure 17: 1024K R (Latency)

 

Figure 17: 1024K R (Latency)

 

Figure 17 shows the latency results during the 1024K read test, reflecting a pattern similar to the throughput results.

S2D with workloads in the mirror tier demonstrates impressively low latency, benefiting from local reads. Latency starts at 1.002 ms and increases to 6.098 ms as IO depth grows.

In contrast, StarWind VSAN starts at 1.998 ms and peaks at 20.625 ms, resulting in S2D delivering up to 240% lower latency than StarWind.

When workloads are distributed across both S2D tiers, latency remains nearly identical to that of the mirror tier tests.

Figure 18: 1024K R (CPU Usage)

 

Figure 18: 1024K R (CPU Usage)

 

Figure 18 highlights CPU usage during 1024K reads, where S2D demonstrates significantly lower resource consumption compared to StarWind VSAN.

With workloads in the mirror tier, S2D starts at 5% CPU usage at a 1 IO depth and increases to 16% at a 16 IO depth.

In contrast, StarWind VSAN begins at 26% and rises to 33%, meaning S2D uses about 63% less CPU on average.

Even when workloads span both tiers, S2D maintains the same CPU usage levels as in the mirror-only benchmarks. S2D’s efficiency in local reads translates to more effective CPU usage, while StarWind requires more resources to sustain consistent performance.

1M write:

Figure 19: 1024K W (Throughput)

 

Figure 19: 1024K W (Throughput)

 

When we shift our focus to 1024K sequential write throughput, Figure 19 reveals that Storage Spaces Direct (S2D) with a workload in the mirror tier holds a significant performance advantage over StarWind VSAN. Specifically, S2D reaches 8,693 MiB/s at a 2 IO depth, while StarWind VSAN manages 3,903 MiB/s. At an 8 IO depth, S2D continues to dominate, hitting 8,559 MiB/s, compared to StarWind’s 4,156 MiB/s. This means S2D delivers approximately 122% higher throughput—if your workload is optimized for the mirror tier.

However, it’s important to note that this advantage is conditional. If the workload doesn’t fit into the mirror tier, S2D’s performance drops dramatically. This is evident in the multi-tiered test results, where StarWind VSAN outperforms Storage Spaces Direct by about 106% on average.

Figure 20: 1024K W (Latency)

 

Figure 20: 1024K W (Latency)

 

Diving into the 1024K write latency as depicted in Figure 20, we see a consistent theme.

With its workload in the mirror tier, Storage Spaces Direct begins at a brisk 2.400 ms and climbs to 18.684 ms at 8 IO depth. In comparison, StarWind VSAN starts at a slower 5.804 ms and escalates to a higher peak of 38.492 ms, which demonstrates that S2D provides up to 106% lower latency.

However, when the workload spans both tiers, the scenario shifts. S2D records the highest latencies, starting at 13.496 ms and surging to 73.178 ms at 8 IO depth. This again indicates a significant performance shift depending on how well the workload is aligned with S2D’s optimal tier configuration.

Figure 21: 1024K W (CPU Usage)

 

Figure 21: 1024K W (CPU Usage)

 

Figure 21 highlights CPU usage during 1024K writes. When running the workload in the mirror tier, Storage Spaces Direct (S2D) starts with 8% CPU usage at 1 IO depth and consistently holds at 9% across 2, 4, and 8 IO depths.

In contrast, StarWind VSAN begins at a much higher 24% and remains steady around 25% across all IO depths. This indicates that S2D consumes approximately 72% less CPU on average, demonstrating significantly more efficient resource utilization compared to StarWind VSAN.

When the workload spans both tiers of S2D, it continues to exhibit even lower CPU usage, starting at just 4% at 1 and 2 IO depths and modestly rising to 5% at 4 and 8 IO depths.

Additional benchmarking: 1 VM 1 numjobs 1 iodepth

To gain a deeper understanding of how StarWind VSAN and Storage Spaces Direct perform under different configurations, we conducted additional benchmarks focusing on a single VM, with numjobs = 1 and an IO depth of 1. The results are presented below.

Benchmark results in a table:

StarWind VSAN NVMe-oF HA (TCP) – Host mirroring + MDRAID5 (1 VM)
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 1 1 1,112 4 0.897
4k random write 1 1 501 2 1.991
4k random write (synchronous) 1 1 226 1 4.415
Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror tier (1 VM)
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 1 1 7,221 28 0.137
4k random write 1 1 5,456 21 0.182
4k random write (synchronous) 1 1 2,887 11 0.344

 

Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror and parity tiers (1 VM)
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 1 1 5,920 23 0.167
4k random write 1 1 2,517 10 0.395
4k random write (synchronous) 1 1 1,772 7 0.562

Benchmark results in graphs:

This section presents visual comparisons of the performance and latency metrics across storage configurations under research.

4k random read:

Figure 1: 4K RR (IOPS)

 

Figure 1: 4K RR (IOPS)

 

Figure 1 demonstrates the IOPS performance for the 4K random read test at 1 IO depth and with one numjobs. S2D in the mirror-accelerated parity (workload in mirror tier) outperforms the StarWind VSAN NVMe-oF HA scenario, delivering 7,221 IOPS. This impressive 550% increase over StarWind’s 1,112 IOPS is largely due to S2D’s ability to leverage local reads and its operation at the host level. In contrast, StarWind VSAN, running inside a VM, faces a longer IO datapath, which impacts its performance. Even when S2D operates with data across both mirror and parity tiers, it maintains strong performance at 5,920 IOPS, still surpassing StarWind by 432%.

Figure 2: 4K RR (Latency)

 

Figure 2: 4K RR (Latency)

 

Latency metrics for the 4K random read test at 1 IO depth, as shown in Figure 2, similarly favor S2D in the mirror-accelerated parity (workload in mirror tier), which records a swift 0.137 ms. This is a substantial 553% lower latency compared to StarWind VSAN’s 0.897 ms. S2D’s ability to keep latency low is again due to its use of local reads and operation at the host level. The mirror-accelerated parity (workload in both tiers) configuration of S2D maintains an edge with a latency of 0.167 ms, outperforming StarWind by 437%.

Figure 3: 4K RW (IOPS)

Figure 3: 4K RW (IOPS)

 

Figure 3 demonstrates the results for the 4K random write test at 1 IO depth and with one numjobs. S2D in the mirror-accelerated parity (workload in mirror tier) achieves 5,456 IOPS – an impressive 990% higher IOPS than StarWind VSAN’s 501 IOPS. This advantage arises because S2D writes directly to the mirror tier, bypassing the need to calculate parity, which is resource-intensive. In the scenario where S2D handles workloads across both the mirror and parity tiers, performance drops to 2,517 IOPS due to the additional step of invalidating data in the parity tier. For a more detailed explanation of how reading and writing occur in a mirror-accelerated parity scenario, please refer to the following link.

On the other hand, StarWind VSAN, which writes directly to the MDRAID5 array, suffers from performance-reducing RMW (read-modify-write) operations, further compounded by the longer IO datapath inherent in its operation inside a VM.

Figure 4: 4K RW (Latency)

 

Figure 4: 4K RW (Latency)

 

Let’s move to Figure 4 and consider latency metrics for 4K random writes. S2D continues to exhibit lower latency due to its streamlined data handling within the mirror tier, recording an exceptionally low 0.182 ms. This is a staggering 995% improvement over StarWind VSAN’s 1.991 ms. Even when S2D involves the mirror and parity tiers, its latency remains favorable at 0.395 ms, outperforming StarWind by 404%. The lower latency in S2D is attributed to its efficient data handling and host-level operation.

Figure 5: 4K RW Synchronous (IOPS)

Figure 5: 4K RW Synchronous (IOPS)

 

In 4K random write (synchronous) operations, as shown in Figure 5, S2D once again leads, with the mirror tier configuration achieving 2,887 IOPS, a 1,177% increase over StarWind VSAN’s 226 IOPS. The reasons for this performance boost are similar to those seen in the 4K random write test, where S2D benefits from direct writes to the mirror tier and the avoidance of parity calculations. In the mirror and parity tiers setup, S2D’s performance, while slightly reduced to 1,772 IOPS, still outpaces StarWind by 684%.

Figure 6: 4K RW Synchronous (Latency)

Figure 6: 4K RW Synchronous (Latency)

 

Lastly, Figure 6 demonstrates latency results for 4K random write (synchronous) operations. It reinforces S2D’s superior performance, with the mirror tier setup offering a latency of 0.344 ms – nearly 1,184% lower than StarWind VSAN’s 4.415 ms. Even in the more complex mirror and parity tiers configuration, S2D maintains a latency of 0.562 ms, outperforming StarWind by 686%. The reasons for this lower latency are consistent with those seen in the 4K random write test, where S2D’s streamlined data handling provides a significant advantage.

Conclusion

To sum it up, both Storage Spaces Direct (S2D) and StarWind VSAN NVMe-oF HA come with their own set of perks and trade-offs for your IT infrastructure.

S2D excels within the Microsoft ecosystem, offering excellent read speeds when your VMs are aligned with the right nodes. Plus, if your workloads stick to the mirror tier, S2D delivers great write performance. But, keep in mind, it needs careful monitoring to ensure optimal performance. If workloads aren’t properly managed or local reads aren’t handled well, performance can drop. S2D can also lose efficiency at certain queue depths and needs extra space for fault tolerance, impacting overall capacity efficiency.

On the other hand, StarWind VSAN NVMe-oF HA is a solid choice for high-performance needs, especially with mixed read/write workloads and large data blocks. It offers stable read and write performance regardless of VM placement and provides better capacity efficiency, making it a strong contender for cost-effective scaling. It doesn’t support local reads and may use slightly more CPU, plus it requires an additional VM for deployment.

So, if you’re looking for exceptional read performance and don’t mind keeping a close eye on your workloads, S2D is a great option. But if you want consistent performance with better capacity efficiency and less hassle, StarWind VSAN NVMe-oF HA is the way to go.

Stay tuned for our upcoming articles, where we’ll dive deeper into these solutions to give you a bigger picture of how they can fit into your IT strategy.



from StarWind Blog https://ift.tt/y74EeSN
via IFTTT

No comments:

Post a Comment