HPe | MrVSAN

I have done a lot of testing on Optane SSDs in the past, but in July of 2022 Intel announced their intention to wind down the Optane business. Since that announcement I have had many questions surrounding Optane and where it leaves customers today.

Well firstly, I am going to address the messaging that was announced back in July, on the Intel earnings call it was announced that Optane had been written off with over half a billion dollars. This led to quite a storm of confusion as I was asked by many “Does this mean I cannot buy Optane any more?”

To the contrary, Optane is still a product and will continue to be a product until at least the end of 2025, and even if you buy it on the last day it is available, you will still get a 5 year warranty.

I have never really spoken about the other side of the Optane house on this blog before, moreso because it wasn’t directly relevant to vSAN. However, there are two sides to Optane, of course as you know the SSD, but there is also the persistent memory side of the Optane Technology.

Optane Persistent Memory (PMEM) is primarily used in VMware as a memory tiering solution. Over the past few years DRAM has become expensive, as well as having the inability to scale. Memory tiering allows customers to overcome both of the challenges on cost as well as large capacity memory modules. PMEM for example is available in 128GB, 256GB and 512GB modules, at a fraction of the cost of the same size modules of DRAM.

Memory tiering is very much like the Original Storage Architecture in vSAN, you have an expensive cache tier, and a less expensive capacity tier. Allowing you to deliver a higher memory capacity with a much improved TCO/ROI. Below are the typical configurations prior to vSphere 7.0U3.

On the horizon we have a new architecture called Compute Express Link (CXL), and CXL 2.0 will deliver a plethora of memory tiering devices. However, CXL 2.0 is a few years away, so the only memory tiering solution out there for the masses is Intel Optane. This is how it looks today and how it may look with CXL 2.0:

I recently presented at the VMUG in Warsaw where I had a slide that states Ford are discontinuing the Fiesta in June 2023, does this mean you do not go and buy one of these cars today? The simple answer is just because it is going away in the future, it still meets the needs of today. It is the same with Optane Technology, arguably it is around for longer than the Ford Fiesta, but it meets the needs to reduce costs today as a bridge to memory tiering architectures of the future with CXL 2.0.

I like to challenge the status quo, so I challenge you to look at your vSphere, vSAN or VCF environments and look at two key metrics. The first one is “Consumed Memory” and the second one is “Active Memory”. If you divide Consumed by Active and the number you get is higher then 4, then memory tiering is a perfect fit for your environment, and not only can you save a lot of your memory cost, but it also allows you to push up your CPU core count because it is a more affordable technology.

Providing your “Active” memory sits within the DRAM Cache, there should be very little to no performance impact, both Intel and VMware have done extensive testing on this.

Proof of Concepts
Nobody likes a PoC, they take up far too much of your valuable time, and time is valuable. I have worked with many customers where they have simply dropped in a memory tiering host into their existing all DRAM based cluster and migrated real workloads to the memory tiered host. This means no synthetic workloads, and the workloads you migrate to evaluate can simply be migrated back.

Conclusion
Optane is around for a few years yet, and even though it is going to go away eventually, the benefits of the technology are here today, in preparation for the architectures of the future based on CXL 2.0. Software designed to work with memory tiering will not change, it is the hardware and electronics that will change, so it protects the investment in software.

Optane technology is available from all the usual vendors, Dell, HPE, Cisco, Lenovo, Fujitsu, Supermicro are just a few, sometimes you may have to ask them for it, but as they say….”If you do not ask, you do not receive”.

I had the privilege recently to work with a customer who had asked HPE to perform some performance benchmarks not just with HCI Bench, but because they run quite a lot of Oracle workloads they wanted to determine if the performance of vSAN on HPe Synergy would be sufficient in order to run their workloads.

Whilst agreeing on the hardware specification, the customer had referenced my previous post on Optane™ Performance and had asked HPE to perform the tests using Optane™ as the cache tier in the Synergy configuration, this was not only to provide a superior performance experience, but it also would free up two capacity slots in the disk tray of the chassis per Synergy compute node meaning the customer could have more capacity.

Synergy Specification:

HPE Virtual Connect SE 40Gb F8 Module for Synergy
HPE Synergy D3940 Storage Module with SAS expanders
3x HPE Synergy 480 Gen10 nodes, each equipped with:
2x Intel® Xeon® Gold 6154 CPU @ 3.00GHz
2x Intel® Optane™ 750 GB SSD DC P4800X Series PCIe (x4)
768 GB Memory (24x 32 GB RDIMM @ 2666 MHz)
2x Disk Group config with 1x Optane + 3x 800GB SAS per Disk Group
LACP based 40GbE interconnection between Compute Nodes

Please note: At the time of writing the 750GB U.2 Optane drives were undergoing certification for HPE Synergy.

In order to perform the Oracle workload testing HPE engaged with their own internal Oracle specialists to determine the correct workloads that needed to be performed, and with a target of <2ms specified by the customer they decided to use Kevin Closson’s SLOB tool, SLOB was configured in the following way:

128 SLOB Schemas
Each Schema was 8GB in Size
Total of 1TB Test data

For the purpose of testing HPE decided that they would perform different tests in the following way:

(A) Single Oracle VM Instance with 128 Schemas, 70% Read, 30% Write
(B) Single Oracle VM Instance with 128 Schemas, 50% Read, 50% Write
(C&D) Single Oracle VM Instance with Heavy REDO activity and Large SGA and REDO_STRESS=Heavy, 50% Read, 50% Write, with 128 Schemas and 32 Schemas
(E) Single Oracle VM Instance
(F & G) 2 Parallel Oracle VM Instances with 64 / 128 Schemas Each, 70% Read, 30% Write

Test	SGA	PGA	Schemas	Scale	REDO_STRESS
A	5G	1G	128	8G	Lite
B	5G	1G	128	8G	Lite
C	256G	100G	128	8G	Heavy
D	256G	100G	32	8G	Heavy
E	5G	1G	128	8G	Lite
F	5G	1G	64	8G	Lite
G	5G	1G	128	8G	Lite

Before the tests were performed, and Oracle I/O Calibration was performed which is a feature of Oracle Databases and is used to assess the performance of the I/O subsystem by issuing an I/O intensive read-only workload in order to determine the maximum IOPS and throughput whilst maintaining close to 0ms latency.

Each test ran for 60 minutes in order to ensure enough data was filling up the write buffer, so let’s take a look at the results:

As you can see from the results, the target of <2ms was achieved successfully and at one point with two oracle VMs achieveing a staggering 250k IOPS at 1.305ms latency is very impressive across a 3-Node cluster, not only was the customer pleased with the results, but the Oracle Specialist within HPE said that the results exceeded their expectations also.

So as you can see, a composable infrastructure deployment of vSAN such as HPE Synergy with Intel Optane™ can still deliver the same levels of performance as standard rack mount servers, combined with VMware Cloud Foundation delivering a full SDDC package from both a hardware and software perspective.

MrVSAN

Tag Archives: HPe

What’s happening with Intel Optane?

Oracle performance on HPE Synergy, vSAN and Intel Optane

It's all about VMware vSAN