Virtual SAN 6.2 Features

Unsaved changes! You may continue editing other frames before saving.

Save Changes

This Virtual SAN 6.2 demo is a series of 'mini-demos' designed to showcase the features and benefits of Virtual SAN.
There are multiple sections accessible by the menu on the left side of this window.
This enables the user/presenter to navigate directly to a specific feature relevant to interest and conversation.

In this first section, we will see how easy it is to enable Virtual SAN.

[Click the Hosts and Clusters icon near the Home tab]

40.

There are a few local datastores already configured in this cluster.
We can see there is 5 terabytes of capacity from these local datastores in the upper-right corner.

[Click the Manage tab]

80.

Here we see Virtual SAN is turned off (disabled)

[Click the Configure button]

160.

Deduplication and Compression is a space efficiency feature available with all-flash Virtual SAN configurations.
This feature reduces the amount of space consumed by data and effectively increases the usable capacity of the datastore.
We will see the results of deduplication and compression after we migrate virtual machines to the new Virtual SAN datastore
(next section). Fault Domain and Stretched Cluster configuration can be performed here.
It is also possible to configure these later, if needed. We will not configure them now.

[Click Next]

390.

Virtual SAN requires a VMkernel network adapter with the Virtual SAN service enabled on all hosts in the cluster.
The configuration wizard verifies this configuration on all hosts in the cluster.
We can see that all hosts have the Virtual SAN service configured.

[Click Next]

420.

Here we see some flash devices - 372GB SSDs - will be claimed for the capacity tier of Virtual SAN.
The 186GB flash devices will be claimed for the Virtual SAN cache tier.
The Seagate drives are magnetic disks, which will not be claimed for use by Virtual SAN.

[Click the scroll bar]

440.

There are 10 capacity tier flash devices and 2 cache tier flash devices on each of the four hosts in the cluster.
There will be two disk groups on each host.
Each disk group will have one flash device for cache and five flash devices for capacity.
Since this is an all-flash Virtual SAN configuration, the cache tier will be used entirely for write buffering.
Reads will come directly from the capacity tier.

[Click Next]

470.

The last step is a summary of what we just configured.
The raw capacity of the Virtual SAN datastore will be 14.56TB. The total cache size is 1.46TB.
With four simple steps, we are now ready to enable Virtual SAN.

[Click Finish]

690.

Now that Virtual SAN is turned on, let's review the disk configuration in one of the hosts.

[Click Disk Management]

790.

Virtual SAN utilizes disk groups, which consists of one flash device for the cache tier
and one to seven additional devices for the capacity tier.
Each host in this Virtual SAN cluster contains two disk groups.
Each disk group has one cache device and five capacity devices for a total of six disks in use.

[Clikc the top disk group]

820.

Looking at the lower half of the window, we see the cache device, which has a capacity of 186GB.
Three of the five 372GB capacity devices are also visible.

[Click the lower-right scroll bar]

850.

...and there are the other two 372GB capacity devices in the disk group.
Now let's take a look at the overall capacity of the Virtual SAN datastore.

[Click the Summary tab]

940.

[Click Virtual SAN Capacity in the middle of the window]

980.

Here we see the total raw capacity of the Virtual SAN datastore is 14.56TB.
Deduplication and Compression information is also displayed.
The Deduplication and Compression ratio is 1x because we have not provisioned or migrated any virtual machines.
We will look at these numbers after virtual machines have been migrated to the Virtual SAN datastore.

[Click the Home icon or Home under Navigator]

2030.

Migrating virtual machines from existing storage to a Virtual SAN datastore is simple.
In this environment, an older fiber channel array is being decommissioned.
The VMs on this array are being migrated to the new Virtual SAN datastore.
Most of the VMs in this environment have already been migrated.
We will migrate the remaining four VMs from LUN zero one to the Virtual SAN datastore.
This migration will happen with no downtime.

[Click hStorage]

2100.

Here we see the four VMs located on LUN zero one.
We select the four VMs...

[Click the dev22 VM]

2150.

[Click any of the VMs]

2180.

[Click Migrate]

2250.

[Click Yes]

2400.

It is possible to change the host and the storage that the VMs are running on.
There is no need to change the hosts that the VMs are running on.
The LUN and the Virtual SAN datastore are both accessible by the hosts.

[Select the Change Storage Only radio button]

2430.

[Click Next]

2640.

It is possible to select a new storage policy when migrating VMs.

[Click the VM Storage Policy drop down menu]

2660.

Selecting a storage policy will refine the list of datastores compatible with the policy.
No longer is it necessary to manually keep track of traditional LUNs and volumes.
Simply select a storage policy and a list of datastores that meet those requirements are listed automatically.

[Click Virtual SAN Default Storage Policy from the menu]

2900.

Naturally, the Virtual SAN datastore is compatible with the selected policy.

[Click Next]

2960.

Here we see that four VMs will be migrated to the Virtual SAN datastore.
They will be assigned the Virtual SAN Default Storage Policy.
We will take a closer look at storage policies in the next section.

[Click Finish]

3040.

With the VM migrations in progress, let's check the capacity of the Virtual SAN datastore.

[Click Home under Navigator or the Home icon]

3060.

[Click the Hosts and Clusters icon]

3300.

More than 50 Windows and Linux VMs are now running on the Virtual SAN datastore.
These VMs would consume 3.57TB of capacity without deduplication and compression.
Since deduplication and compression are enabled, the capacity consumed by these VMs was reduced to 1.16TB.
That is a reduction of slightly more than 3x resulting in a savings of 2.41TB of capacity.
Note that deduplication and compression requires some overhead.
That is why Virtual Disks under Used Capacity Breakdown is showing 1.73TB (used + overhead).

[Click Home under Navigation or the Home icon]

4330.

This section showcases the performance service in Virtual SAN 6.2.
A number of metrics can be viewed including IOPS, throughput, and latency.
We will see some of these metrics at the cluster level, host level, per VM, and per virtual disk.
Let's start with enabling the performance service.

[Click the Hosts and Clusters icon]

4370.

[Click Health and Performance]

4450.

Here we see the performance service is currently turned off.

[Click Edit]

4570.

Virtual SAN performance data is stored as an object on the Virtual SAN datastore.
It is commonly protected by the Virtual SAN Defualt Storage Policy, which means there are two copies.
This helps avoid performance data loss in the event of a disk or host failure.
Another storage policy can be selected for the performance service data, if desired.
We will keep with the default policy.

[Click OK]

4730.

We can see the performance service is turned on.
The Virtual SAN Default Storage Policy is assigned to the performance data.
The object holding the data is compliant with the storage policy.
Now let's take a look at the wide variety of performance information available.

[Click the Monitor tab]

4760.

The Virtual SAN cluster is selected in the Navigator column.
Virtual Machine Consumption and Virtual SAN Backend performance data is available at this level.
Virtual SAN Backend traffic includes data such as metadata updates and data that is transferred during object creation, rebuilds, etc.
The top graph shows IOPS consumed by all of the virtual machines in the cluster - currently around 4500.
Read throughput is trending around 20 MegaBytes per second. Let's scroll down for more metrics.

[Click the scroll bar on the right]

4780.

Read and write latencies are steady at less than 1 millisecond and there is no congestion.
An exmaple of where congestion might occur is when a hybrid Virtual SAN configuration is sustaining a very high number of writes and the capacity tier - spinning magnetic disks - are not able to write data as fast as the cache tier - a flash device - needs to destage it.
Seeing this configuration is all flash and the total amount of IOPS is fairly low, it is no surprise congestion is consistently at zero.
Next let's look at the Virtual SAN Backend graphs.

[Click Virtual SAN - Backend]

4880.

Here we see similar graphs for Virtual SAN Backend traffic. Note that IOPS and Throughput are slightly higher than virtual machine consumption. It is likely that a fair amount of data is being written by Virtual SAN such as making sure there are two consistent copies of virtual machine data based on the storage policy assigned to the virtual machines. Deduplication and compression is also enabled, which adds a small amount of overhead. However, latency for this network traffic is well below 1 millisecond.
Next let's look at a few host-level performance graphs.

[Click the first host, prmh-a09-sm-05.en...]

4990.

Similar to the graphs before, we see virtual machine consumption...

[Click Virtual SAN - Backend]

5110.

... and Virtual SAN Backend.

[Click Virtual SAN - Disk Group]

5240.

Performance information can also be observed at the Virtual SAN Disk Group level...

[Click Virtual SAN - Disk]

5370.

... and at the individual physical disk level.
Now we will quickly view Virtual Machine performance graphs.

[Click virtual machine app10]

5480.

IOPS, throughput, latency, and other metrics are also available for virtual machines.
These graphs show measurements for all of the virtual machine's virtual disks...

[Click Virtual SAN - Virtual Disk]

5570.

... and we can see graphs for individual virtual disks. This virtual disk is producing around 5000 IOPS.
If the virtual disk was assigned a storage policy with an IOPS Limit, we would see that setting, as well.
Let's scroll down and see a few more charts.

[Click the scroll bar on the right]

5590.

[Click the scroll bar on the right]

5600.

Here are Virtual SCSI IOPS and Throughput...

[Click the scroll bar on the right]

5640.

... and here is latency - again, below 1 millisecond.
We just saw how easy it is to enable the Virtual SAN performance service.
We viewed a number of metrics including IOPS, throughput, and latency.
These metrics are available at the cluster level, per host, at the disk group level, per physical disk, and at the virtual machine level.
It is also possible to change the time range. By default, the last hour of metrics are shown. The number of hours can be changed.
A custom date and time range can also be specified.

[Click Home under Navigator or the Home icon at the top]

6670.

Virtual SAN utilizes storage policies to assign a number of availability and performance services to virtual machines.
In this section, we will create a new storage policy that defines the number of disk or host failures to tolerate (FTT).
Since this is an all-flash Virtual SAN configuration, we will use the erasure coding failure tolerance method to minimize capacity consumption.
We will also set the number of stripes as two, which can improve read performance.

[Click VM Storage Policies]

6790.

The are two default storage policies - one for Virtual Volumes, one for Virtual SAN.
When a virtual machine is migrated to or created on a Virtual SAN datastore and no policy is selected,
the virtual machine is automatically assigned the Virtual SAN Default Storage Policy.
Let's create a new policy.

[Click Create a new VM storage policy icon just below the Objects tab]

6910.

We must first give the new policy a name and, optionally, a description. We will call this policy Gold.

[Click the Name field]

6920.

[Press any key]

6960.

It is a good idea to include a description of the policy rules - especially if there are multiple vSphere administrators.
This makes it easy to see what rules are defined in the storage policy.

[Click the Description field]

6970.

[Hit any key to type]

7220.

Number of Failures To Tolerate (FTT) will be set to one.
We will use the RAID-5/6 erasure coding failure tolerance method.
Number of disk stripes per object will be configured as two.

[Click Next]

7250.

[Click Next]

7270.

Keep in mind, storage policies can have rules based on Virtual SAN and Virtual Volumes.
In this case, we are creating a storage policy based on Virtual SAN data services.

[Click the Rules Based On Data Services drop-down menu]

7290.

[Click VSAN]

7310.

Next, will add rules that govern the level of data protection.
We will set the Number of Failures to Tolerate to 1. Virtual SAN will distribute data across the cluster so that the loss of
a disk or and entire host will not cause data loss. The Failure Tolerance Method will be configured as RAID-5/6 erasure coding.
RAID-5/6 erasure coding requires less capacity than RAID-1 mirroring for the same level of protection.

[Click the drop-down menu]

7330.

[Click Number of failures to tolerate]

7510.

By default, Number Of Failures To Tolerate (FTT) is set to 1. We will keep that setting.

[Click the drop-down menu]

7530.

[Click Failure tolerance method]

7690.

The default Failure Tolerance Method is RAID-1 mirroring, which might yield slightly better performance than RAID-5/6 erausre coding,
but mirroring does consume more capacity than erasure coding (for the same level of data protection).
The Storage Consumption Model on the right side shows that mirroring a 100GB virtual disk will consume 200GB of capacity.
In this case, minimizing capacity consumption is higher priority so we will change this to RAID-5/6 erasure coding.

[Click the Failure Tolerance Method drop-down menu]

7710.

[Click RAID-5/6 (Erasure Coding) - Capacity]

7870.

After changing the Failure Tolerance Method to RAID-5/6 erasure coding, the Storage Consumption Model was updated.
A 100GB virtual disk will consume 133GB - a 33% reduction in the amount of storage capacity consumed (versus mirroring).

This policy will be assigned to workloads that will likely benefit from being able to simultaneously read data from multiple flash devices.
To potentialy improve read performance, we will instruct Virtual SAN to stripe data across two disks.

[Click the drop-down menu]

7890.

[Click Number of disk stripes per object]

8040.

The default number of stripes per object is 1. We will change this to 2.

[Click the Number of Disk Stripes Per Object field]

8050.

[Press any key]

8070.

There are no additional rules to add so we will continue with creating the storage policy.

[Click Next]

8270.

Here we can see which datastores are compatible with the rules we just configured.
Naturally, the Virtual SAN datastore is the only datastore in the cluster that is compatible with this policy.

[Click Next]

8330.

The last step in the workflow enables us to confirm we have the correct configuration.

[Click Finish]

8410.

The new storage policy named Gold has been added. The rules in this policy include Failures to Tolerate (FTT) = 1,
RAID-5/6 erasure coding is the fault toerance method, and number of stripes per object is 2.

[Click Home under Navigator or the Home icon]

9440.

We will assign the 'Gold' storage policy to a virtual machine.
The 'Gold' policy has the Failure Tolerance Method set to RAID-5/6 Erasure Coding.
We will also take a brief look at the capacity savings of erasure coding versus mirroring
while maintaining the same level of resiliency (FTT = 1).

[Click the Hosts and Clusters icon]

9620.

To assign a storage policy, we right-click on a VM...

[Click app10]

9660.

[Click VM Policies]

9680.

[Click Edit VM Storage Policies]

9790.

It is common for the same storage policy to be assigned to the VM home object and all virtual disks that belong to the VM.
In some cases, a separate storage policy might be appropriate for a one of the objects such as a virtual disk.
Virtual SAN supports separate policies for VM home and individual virtual disk objects.
As you can see in this example, we can modify the storage policy for hard disk 2.

[Click the VM Storage Policy drop-down menu for Hard disk 2]

9810.

Let's change the policy to Gold.

[Click Gold from the drop-down menu]

9960.

If we click OK, the VM home and Hard disk 1 objects would retain the Virtual SAN Default Storage Policy.
Hard disk 2 would have the Gold policy assigned.
In this case, we will assign the Gold policy to the entire VM.

[Click VM storage policy drop-down menu at the top of the window]

9980.

[Click the Gold policy in the drop-down menu]

10000.

[Click the Apply to all button]

10170.

Notice that the Gold policy is now assigned to the VM home and both Hard disk objects.

[Click OK]

10370.

The virtual machines to which a storage policy is assigned to can be viewed in the VM Storage Policies user interface.
We will see that the 'Gold' policy is assigned to 'app10'.

[Click Home under Navigator]

10440.

[Click the VM Storage Policies icon]

10580.

With the Gold policy selected in the Navigator column, we see that app10 is the only VM the policy is assigned to.
The compliance status currently shows as 'Noncompliant' for all three objects.
That is because the policy was just assigned to the VM - it takes time for Virtual SAN to make the changes.

Now that some time has passed, let's trigger a policy compliance check.

[Click the Trigger VM storage policy compliance check icon]

10750.

Virtual SAN has completed the necessary operations to comply with the Gold policy. All three objects now show as 'Compliant'.
We will apply the Gold policy to other virtual machines in the cluster.
You have already seen how a policy is applied - I will not repeate that action for the rest of the VMs.
With the assumption that the policy has been applied to other VMs, let's refresh the list.

[Click the refresh icon]

10890.

With the refresh, we can see what the UI shows when a policy is applied to multiple VMs.
We just viewed how a Virtual SAN storage policy is assigned to a virtual machine.
Policies can be assigned precisely to VM home objects and individual virtual disks.

[Click Home under Navigator]

11910.

Virtual SAN stores VM configuration files and virtual disks as objects.
Each object consists of one or more components depending on size and the storage policy assigned.
Components for each object are commonly distributed across multiple disks and hosts for performance and availability.
Let's take a look at a few examples of how storage policies determine component placement.

[Click Hosts and Clusters]

12010.

Here we see the VM named app11 has the Virtual SAN Default Storage Policy assigned to it.
Number of failures to tolerate - FTT - equals 1, which means the loss of a disk or a host will not cause data loss.
The failure tolerance method is RAID-1 mirroring, which means full copies are created for redundancy and performance.
The VM has 1 virtual disk (VMDK) that is 80GB.
Let's look at how Virtual SAN creates and distributes components for this virtual disk based on the assigned policy.

[Click cluster-pa-af in the Navigator column]

12160.

The entire list of VMs and virtual disks is shown. Let's filer the list to find app11.
'app11' is already populated in the search field so we just click to narrow down the list.

[click 'app11' in the search field]

12170.

Let's expand the VM to see the 80GB virtual disk.

[click the small arrow next to app11 in the list of Virtual Disks]

12180.

app11 has two objects:
VM Home, which contains the VM's configuration files such as the VNX and NVRAM files.
Hard disk 1, which is a virtual disk attached to the VM.
To view components, we must select an object.

[Click Hard disk 1]

12280.

Since the virtual disk is less than 255GB, we start with one component.
The storage policy is configured with FTT=1 and RAID-1 mirroring.
The second replica component is placed on a separate host to protect against a disk or host failure.
A third 'Witness' component is also created to protect against 'split-brain' scenarios if hosts lose network connectivity.
Now let's apply another storage policy to the same VM to see how this affects the number and placement of components.

[Click app11 in the Navigator column]

12320.

[Click VM Policies]

12360.

[Click Edit VM Storage Policies...]

12500.

Again, we see the two objects that belong to app11. Storage policies can be assigned to individual objects.
We will change the storage policy for Hard disk 1 to the 'Silver' policy.
That policy contains the rules FTT = 1 and failure tolerance method = RAID-5/6 (Erasure Coding).

[Click the VM Storage Policy drop-down menu for Hard disk 1]

12530.

[Click the Silver storage policy]

12690.

[Click OK]

12860.

[Click cluster-pa-af in the Navigator column]

12990.

The Virtual SAN Default Storage Policy is assigned to app11's VM Home object.
The Silver policy is assigned to app11's virtual hard disk.
Both objects are compliant with their assigned storage policy.
Let's take a look at component placement as a result of assigning the new policy to the virtual disk.

[Click Hard disk 1]

13100.

Since FTT is set to 1, RAID-5 (not 6) erasure coding is used. There are three data components plus a parity component.
Note that each component is on a separate host. A minimum of 4 hosts are required for FTT = 1 and RAID-5.
If any of the data components go offline (disk or host failure), the data can be rebuilt using the parity component.
RAID-5 erasure coding consumes 33% less raw capacity than RAID-1 mirroring for the same level of redundancy.
Let's change the storage policy to 'Gold', which adds a stripe width of 2 (default is 1).
Then, we will see how this new policy assignment affects the number and placement of components.

[Click the app11 VM in the Navigator column]

13160.

[Click VM Policies]

13200.

[Click Edit VM Storage Policies...]

13330.

[Click Silver in the VM Storage Policy drop-down menu]

13360.

[Click Gold on the drop-down menu]

13490.

[Click OK]

13630.

Keep in mind in may take several minutes or longer for an object to become compliant with a newly assigned or modified storage policy.
This is normal behavior as Virtual SAN makes the changes to component number and placement to satisfy policy compliance.
Let's take another look at app11's components.

[Click cluster-pa-af in the Navigator column]

13740.

As mentioned a few moments ago, the Gold policy has the rule Number of Stripes = 2 (the default is 1).
Here we see there are now two components for the virtual hard disk object on each host (in a RAID-0 configuration).
This is because of the Number of Stripes = 2 rule. Each stripe is placed on a different disk.
Adding more stripes is done in some cases to boost performance for read-intensive applications.
Having multiple stripes enables data to be read from multiple disks simultaneously.
The data is distributed across all 4 hosts to satisfy the FTT = 1 and Failure Tolerance Method = RAID-5/6 Erasure Coding rules.

[Click the scroll bar on the right]

13760.

... and here we can see the last two components. The virtual disk now consists of 8 components.
However, these are smaller components than what we originally started with when the app11 VM had the
Virtual SAN Default Storage Policy assigned to it (FTT = 1, RAID-1 Mirroring, 1 stripe).
Let's look at one more example.

[Click Home]

13800.

If an object is larger than 255GB, it will be split up into multiple components with each being up to 255GB in size.
For example, Virtual SAN would create two components for a 500GB virtual disk (VMDK).
I have another storage policy called Bronze where FTT = 0 and Number of Stripes = 1.
While FTT = 0 offers no redundancy, it might be useful for transient workloads where capacity conservation is
a higher priority than protection from losing data due to disk or host failure. We use it here simply for demonstration.
A 600GB virtual disk was added to app11 and the Bronze policy was assigned to this new virtual disk. Let's take a look.

[Click the Hosts and Clusters icon]

13910.

Here we see app11 now has two virtual disks. Hard disk 2 is the 600GB disk that was added.

[Click Hard disk 2]

14000.

Since Hard disk 2 is 600GB in size, it consist of 3 components.
(600GB divided by 255GB max component size = 2.35, round up to 3)
With FTT set to 0, there is no RAID-1 or RAID-5 redundancy. If FTT is set to 1 and RAID-1 mirroring is used, Virtual SAN would create
a copy of these 3 components on another host for a total of 6 components.
What is interesting is that RAID-5 would require only 4 components - 3 components of less than 255GB + a parity component!
We just saw some examples of how Virtual SAN distributes components based on storage policy rules. Let's return to the home screen.

[Click Home under Navigator or the Home icon]

15020.

Virtual SAN features a health check service, which checks a wide variety of factors that could cause issues.
These include items such as proper network configuration, physical disk health, component limits, and performance service status.
The health service is enabled by default and runs every 60 minutes. This interval can be changed.
Let's take a closer look at the items monitored by the Virtual SAN Health Service.

[Click the Hosts and Clusters icon]

15150.

Here is the list of Virtual SAN Health Service catagories. There is a warning next to Hardware compatibility.
Let's come back to that in a moment. First, we will expand each of the other catagories to see monitored items.
We will work our way down the list starting with Network.

[Click Network]

15260.

There are many checks performed for networking. For example, all hosts are checked for consistent multicast settings,
matching subnets, and a vmknic with the Virtual SAN service enabled.
Other checks include connectivity to vCenter Server, presence of unexpected Virtual SAN cluster members, and network partition issues.

[Click Network]

15360.

[Click Physical disk]

15460.

Here we see the various physical disk metrics that are monitored by Virtual SAN.
Issues with items such as disk capacity, congestion, and component metadata health are viewed here.

[Click Phycial disk]

15550.

[Click Data]

15620.

Virtual SAN object health is also monitored. A warning or error might be observed here if availability of an object is reduced.
For example, an object protected by RAID-1 mirroring loses one of the mirrors due to a disk failure.

[Click Data]

15710.

[Click Cluster]

15820.

A number of cluster-level items are checked. Examples include disk format version, deduplication and compression consistency,
and how well data is balanced across hosts in the cluster. If a rebalance is needed, an administrator can start it manually or
simply wait until Virtual SAN performs this action automatically.

[Click Cluster]

15950.

[Click Limits]

16050.

The Virtual SAN health service verifies limits such as disk space or component counts
are not exceeded in the cluster's current state and in the event of a host failure.

[Click Limits]

16120.

[Click Performance service]

16200.

The Virtual SAN Performance Service stores performance data on the Virtual SAN datastore.
This category of the Health Check service verifies that data is being collected from all hosts
and that the object storing the performance data is healthy.

[Click Performance service]

16250.

Now let's see why a warning is being produced in the Hardware compatibility category.

[Click Hardware compatibility]

16330.

There are actually a few warnings in this category. They appear to point to the use of hardware that is not on the hardware compatibility list (HCL).
You should always use hardware, firmware, and drivers on the HCL to help ensure consistent performance and reliability from Virtual SAN.
Let's view the Controller Driver warning.

[Click the Controller Driver warning]

16340.

Here we see there are a few controller devices in each of the hosts and that the drivers in use need to be changed or updated.
However, more information is needed. Virtual SAN includes an 'Ask VMware' button that takes us directly to the relevant knowledge base (KB) article.

[Click Ask VMware]

16410.

The 'Ask VMware' button took us to the KB article that discussses the issue in depth and provides resolution.

[Click the X button in the upper right corner to close the KB article window]

16430.

We just reviewed the lists of items that are monitored by the Virtual SAN Health Check service.
When a warning or error is reported, Virtual SAN helps lower the amount of time spent finding a resolution.
The 'Ask VMware' button opens the KB article relevant to the issue to help speed up resolution.

[Click Home under Navigator or the Home icon]

17450.

Virtual SAN makes it easy to view capacity information such as used capacity and free space.
Deduplication and compression space efficiency metrics are available.
It is also possible to see a more detailed breakdown by object and data types.

[Click the Hosts and Clusters icon]

17640.

Here we see the capacity overview and the deduplication and compression overview.
There is a mix of 160 Windows and Linux VMs in this cluster all residing on the Virtual SAN datastore.
Deduplication and compression is enabled resulting in a savings of more than 5 terabytes.
Space efficiency features such as deduplication and compression lower the cost per usable GB.
In many cases, Virtual SAN all-flash configurations are on par with or less expensive than other hybrid solutions.
Let's scroll down for a more detailed look at used capacity. Note that the following numbers are before deduplication and compression.

[Click the scroll bar on the right]

17670.

Note that the numbers are before deduplication and compression. Virtual disks and VM Home objects contain the actual VM data.
Swap objects are used for VM memory swap, if needed. The performance database object contains the Virtual SAN performance information.
File system overhead is required to maintain the Virtual SAN file system metadata
Deduplication and compression requires capacity for metadata such as hash, translation, and allocation maps.
Checksum overhead is used to store checksum information.
The Other category accounts for items such as templates, unregistered VMs, and ISO files.

[Click the Group By drop-down menu]

17690.

[Click Data types]

17740.

Primary VM data consists of the 'original copy' of VM objects such as VM Home and virtual disks (VMDK files).
Virtual SAN overhead shows the additinal capacity required for replica copies, witness objects, RAID-5/6 parity components, etc.
As you just saw, Virtual SAN provides details on capacity consumption and the efficiency of space saving features (deduplication, compression, etc.).

[Click Home below Navigator or the Home icon]

18760.

Virtual SAN includes a set of proactive tests to help ensure correct configuration.
These tests are commonly run right after Virtual SAN is enabled (before running production workloads).
Let's take a look at these tests.

[Click the Hosts and Clusters icon]

18810.

There are three tests: VM creation, multicast performance, and storage performance.
As you can see the first two tests have been run with a result of 'Passed'
The VM creation test creates a VM on each host in the cluster and places the files on the Virtual SAN datastore.
Running this test verifies each host can access and create objects, as needed, on the Virtual SAN datastore.
The multicast performance test, as you would expect, verifies the performance levels of multicast network communications are acceptable.
Let's run the storage performance test.

[Click Storage performance test]

18840.

[Click the green arrow below Proactive Tests]

19030.

The duration is set to 10 minutes, but this can be adjusted. We will keep it at 10 minutes.
Note that is is possible to select the storage policy that will be used in the test.
We will use the Virtual SAN Default Storage Policy.
A number of workload profiles are available...

[Click Low stress test in the Workload drop-down menu]

19060.

For example, we can change the workload profile to 100% reads or 100% writes.
We will stay with the default 'Low stress test' profile.

[Click Low stress test]

19070.

[Click OK]

19460.

This is one of the benefits of an offline demo - we do not actually have to wait 10 minutes to see the results. :)
The test shows various metrics such as IOPS, throughput, and average latency.
Keep in mind this was the low stress workload profile. An all-flash Virtual SAN cluster can certainly achieve much higher numbers.

We just saw the proactive tests available in Virtual SAN, which are useful for verifying proper configuration and performance.

[Click Home under Navigator or the Home icon]

20480.

Virtual SAN utilizes RAID-1 mirroring and RAID-5/6 erasure coding to provide resiliency against disk and host failures.
It is also tightly integrated with vSphere High Availability (HA) which automates VM restarts and minimizes downtime when a host fails.
We are going to see how Virtual SAN distributes data across hosts to maintain accessibility when a disk or host is offline.
We will also see how vSphere HA along with Virtual SAN quickly restarts VMs on other hosts in the cluster.

[Click the Hosts and Clusters icon]

20560.

Here we see our cluster with 4 hosts. There are 160 VMs running on Virtual SAN in this cluster.
Let's take a look at the list of VMs running on host 05.

[Click prmh-a09-sm-05.eng...]

20630.

There are 10 VMs on this host. As you can see, they are all powered on and their status is 'Normal'.
Throughout the rest of this demo, we will focus on the VM named 'app10'.

[Click cluster-pa-af]

20720.

Let's examine component placement for app10's hard disk 1.

[Click app10 in the list of Virtual Disks]

20770.

[Click Hard disk 1]

20870.

Hard disk 1 has the Silver storage policy assigned to it. That policy has the following rules configured:
FTT = 1, Failure Tolerance Method is RAID-5/6 erasure coding.
There are 3 data components and a parity component (RAID-5). One of these components is on host 05.
Now we will simulate a host failure by powering off host 05.

[Hit any key to continue and switch to the IPMI window]

20880.

Using the IPMI (Intelligent Platform Management Interface) user interface for host 05, we will power off the host.

[Click Power Off Server - Immediate]

20900.

[Click Perform Action]

20940.

We can see the host is now powered off. Let's switch back to the vSphere Web Client.

[Click the minimize button]

20950.

Host 05 is displaying an error (red exclamation point).
The other 3 hosts in the cluster are displaying warnings (yellow exclamation point) as they are unable to communicate with host 05.
We also see that app10 is disconnected and not powered on as it is one of the 10 VMs that were running on host05.
Let's refresh the vSphere Web Client and see if vSphere HA has restarted app10.

[Click the Refresh icon]

21130.

It is clear there is an issue with host 05 based on the vCenter Server alarm notifications.
More importantly, we see that app10 has been powered on (along with the other 9 VMs that were running on host 05).
Let's click on app10 to view more details.

[Click app10]

21200.

App10 was down for only a few minutes and was automatically restarted on host 06 thanks to vSphere HA.
Virtual SAN distributes components across the cluster to avoid data loss when a disk or host goes offline.
Remember that one of the components for app10's hard disk 1 is on host05.
Let's check the current state of these components.

[Click cluster-pa-af]

21320.

[Click app10 under Virtual Disks]

21360.

[Click Hard disk 1]

21490.

Notice that Hard disk 1 is showing as healthy, which means the data is accessible.
However, it is not compliant with the assigned storage policy.
Three of the components that make up the Hard Disk 1 object are active on hosts 06, 07, and 08.

[Click the vertical scroll bar in the lower right corner]

21520.

As expected, the component on host05 is displayed as 'absent' since host05 is offline.
That is why the object's compliance status is 'Noncompliant'
With RAID-5 erasure coding (FTT = 1), VM objects are still accessible when a disk or entire host is offline.

Host05 has been repaired. Let's switch to the IPMI window and power it on.

[Hit any key to continue]

21530.

We can see the host is powered off.
The Power On Server radio button is selected...

[Click Perform Action]

21570.

... and now the host is powered on.
Let's switch back to the vSphere Web Client to view the current status of the cluster.

[Click the minimize button]

21580.

The vCenter Server alerts are still visible.
One of app10's hard disk 1 components still shows as 'absent'.
This is no surprise as it take a few moments for host05 to boot.
Let's refresh the vSphere Web Client.

[Click the Refresh icon]

21840.

Now that host 05 has booted, we see all 4 hosts are online and there are no vCenter Server alerts.
All of the components that are part of app10's hard disk 1 object are now showing as 'active'.
The cluster is back to a healthy state. Virtual machine data remained accessible even with a host failure.
All of the virtual machines that went offline because of the failure were restarted on other hosts in just a few minutes.
vSphere and Virtual SAN provide a highly resilient infrastructure for running business critical applications.

[Click Home under Navigator or the Home icon]

22870.

Maintenance Mode simplifies the task of taking a vSphere host offline in a Virtual SAN cluster.
In most cases, every host in the cluster contributes storage to the Virtual SAN datastore.
When a host is temporarily or permanent removed from the cluster, the Virtual SAN data on that host must be managed.
Virtual SAN helps automate the process of managing data by providing three options for maintenance mode.
We will review those options, place a host in mainteance mode, and then exit maintenance mode.

[Click the Hosts and Clusters icon]

22910.

There is currently 14.5TB of raw capacity in the Virtual SAN datastore as all 4 hosts in the cluster are contributing storage.
Deduplication and compression is enabled, which has reduced the capacity consumed by more than 5TB in this all-flash Virtual SAN configuration.
Maintenance needs to be performed on host 05. Let's put the host in maintenance mode.
This action will migrate all of the running VMs on host 05 to other hosts in the cluster without downtime.

[Click prmh-a09-sm-05.eng...]

23000.

[Click Maintenance Mode]

23020.

[Click Enter Maintenance Mode]

23080.

Since Virtual SAN is enabled, options are provided for migrating data off of the host that is entering maintenance mode.
The default option is 'Ensure accessibility'. This option ensures that at least one copy of the Virtual SAN data on this host is accessible.
Consider a virtual disk (VMDK) object, as an example. If there is a copy of the virtual disk on another host, the data is not migrated
from the host entering maintenance mode. If there is not another copy of the data (FTT = 0), the data will be migrated from the host
needing maintenance enters maintenance mode. As you might expect, it will take longer for the host to enter maintenance mode if
there is a lot of data that needs migrated. The 'ensure accessibility' mode is a good option for hosts that will be offline temporarily.

[Click Ensure accessibility]

23100.

Here are the other two options: 'Full data migration' and 'No data migration'.
Clicking the information icon (small, gray circle with the lower case i) provides more information on the three data migration options.

[Click the information icon]

23110.

To summarize, 'Full data migration' evacuates all of the Virtual SAN data from the host entering maintenance mode.
This option is commonly used when the host is taken offline permanently or for a longer period of time.
This option can take a considerable amount of time depending on the amount of data that must be moved to other hosts.
'No data migration' simply puts the host in maintenance mode without migrating any data. That, of course, is typically the fastest option.
The primary concern with 'No data migration' is any VM that is assigned a policy where FTT = 0. It is possible this VM will become
inaccessible if the only copy of the data resides on the host in maintenance mode until the host is back online (exits maintenance mode).

[Click the Close icon in the Help popup window]

23120.

Since this host will only be offline for a short time, we will keep the default option 'Ensure accessibility'.

[Click OK]

23360.

Entering maintenance mode can take several minutes or longer depending on how many running virtual machines must be
migrated (using vMotion) to other hosts and how much data must be migrated from the host.
There were only a few VMs on this host and all VMs in the cluster are assigned a policy that contains the rule FTT = 1.
(No data migration was necessary)
However, we can see the Virtual SAN raw capacity is down to 11.1TB since host 05 is not currently contributing storage.
Maintenance on the host is complete. Let's take the host out of maintenance mode.

[Click prmh-a09-sm-05.eng...]

23380.

[Click Maintenance Mode]

23400.

[Click Exit Maintenance Mode]

23420.

Host 05 is back online. Let's check the Virtual SAN raw capacity.

[Click cluster-pa-af]

23610.

Virtual SAN capacity has returned to 14.5TB.

Maintenance mode simplifies the task of taking servers offline and bringing them back online.
It includes data migration options for various scenarios such as temporary maintenance and permanent removal.
More importantly, the process of entering and exiting maintenance mode occurs without VM downtime.

[Click Home under Navigator or the Home icon]

24640.

Virtual SAN includes 'rack awareness' to provide resiliency against rack power failure and top-of-rack switch failure.
Rack awareness is accomplished by configuring fault domains. Virtual SAN distributes object components across these fault domains.
This enables Virtual SAN to protect data against disk, host, and entire rack failures.
There are six hosts across three racks in the Virtual SAN cluster (two hosts per rack). We will configure three fault domains.

[Click the Hosts and clusters icon]

24790.

Let's review the current component placement for app10's hard disk 1 before we configure fault domains,

[Click app10 in the Virtual Disks table]

24820.

[Click Hard disk 1]

24980.

FTT = 1 so Hard disk 1 is mirrored across two hosts and there is a witness component on a third host.
Hosts 3 and 4 are in rack 1. Hosts 5 and 6 are in rack 2. Hosts 7 and 8 are in rack 3.
One of the two copies of hard disk 1 and the witness component are in the same rack - rack 2.
If rack 2 fails, hard disk 1 would be inaccessible until the other copy or the witness comes back online.
Let's configure fault domains to resolve this issue. We will name the fault domains rack01, rack02, and rack03.

[Click the Manage tab]

25100.

[Click the green plus sign]

25130.

[Click the Name field]

25140.

[Press any key]

25200.

Hosts 3 and 4 will be added to rack01

[Click prmh-a09-sm-03.eng.vmware.com]

25240.

[Click prmh-a09-sm-04.eng.vmware.com]

25290.

[Click OK]

25550.

One fault domain containing two hosts has been created. Let's create a second and third fault domain with the four remaining hosts.

[Click the green plus sign]

25580.

[Click the Name field]

25590.

[Press any key]

25650.

[Click prmh-a09-sm-05.eng.vmware.com]

25700.

[Click prmh-a09-sm-06.eng.vmware.com]

25770.

[Click OK]

26000.

[Click the green plus sign]

26050.

[Click the Name field]

26060.

[Press any key]

26120.

[Click prmh-a09-sm-07.eng.vmware.com]

26170.

[Click prmh-a09-sm-08.eng.vmware.com]

26260.

[Click OK]

26390.

We now have three fault domains each containing two hosts. Each fault domain corresponds with a server rack.
Let's take another look at component placement for app10's hard disk 1.

[Click the Monitor tab]

26420.

Here we see that two of the three components that make up the hard disk 1 object are in the same fault domain.
These two components are on separate hosts, but in the same rack.
Notice that Hard disk 1 is not compliant with its storage policy because of the two components in the same fault domain.
It can take several minutes or longer for Virtual SAN to redistribute components after fault domains have been configured.
Let's refresh the vSphere We Client to see if the components have been redistributed.

[Click the refresh icon]

26460.

The components are distributed across the three racks (fault domains) - Hard disk 1 is compliant with its storage policy.
We just demonstrated how Virtual SAN achieves rack awareness by configuring fault domains.
This enables Virtual SAN resiliency against disk failure, host failure, and rack failure.

[Click Home below Navigator or the Home icon]

27490.

Virtual SAN supports stretched cluster configurations. Two well-connected sites provide resiliency against entire-site failure.
A copy of each VM is maintained synchronously at both sites. If a site goes offline, the VMs in that site are restarted at the other site.
A 'witness' host is placed at a third site. The witness serves as a tiebreaker when a decision must be made regarding
the availability of Virtual SAN object components when the network connection between the main two sites is lost.
We will configure a stretched cluster and observe the behavior when a site goes offline.

[Click the Hosts and Clusters icon]

27610.

Here we see 6 hosts in the Virtual SAN cluster.
Hosts 03, 04, and 05 are located at the primary or 'preferred' site. Hosts 06, 07, and 08 are located at the 'secondary' site.
DRS affinity rules state that application, e-commerce, and web servers should run at the primary/preferred site.
The Virtual SAN Stretched Cluster status is currently disabled. Let's configure the stretched cluster.

[Click the Configure button]

27710.

All of the hosts are currently in the 'Preferred' fault domain.
We will move hosts 06, 07, and 08 to the 'Secondary' fault domain.

[Click prmha-a09-sm-06.eng.vmware.com]

27750.

[Click prmha-a09-sm-08.eng.vmware.com]

27810.

[Click the >> button]

27840.

Now the three hosts in the primary/preferred site are in the Preferred fault domain.
The three hosts at the secondary site are in the 'Secondary' domain.

[Click Next]

27880.

A witness host running at a third site is required. It can be a physical machine or a virtual machine running ESXi.
A virtual machine witness is deployed from an preconfigured OVA file, which helps make stretched cluster configuration simple.
A witness has two networks configured - one for host management traffic, the other for Virtual SAN traffic.
Here we see there are two witness hosts (VMs) available. We will use the second one.

[Click prmha09vm11.eng.vmware.com]

27980.

[Click Next]

28110.

A witness host stores Virtual SAN metadata and witness objects. It does not store VM data such and VM Home and virtual disk objects.
The witness host is part of the Virtual SAN cluster. Therefore, we must select a cache device and a capacity device.

[Click the top Local VMware Disk]

28170.

[Click the bottom Local VMware Disk]

28210.

[Click Next]

28240.

A summary of the Virtual SAN stretched cluster configuration is presented.
We have 3 hosts at the primary/referred site, 3 hosts at the secondary site, and a witness host at the third site.

[Click Finish]

28410.

Stretched cluster configuration is complete. The star on the Preferred group of hosts indicates Virtual SAN will favor
running workloads in the Preferred fault domain if there is network disconnection between the two main sites ('split brain' scenario).

With the new stretched cluster configuration, Virtual SAN will relocate data so that mirrored components are located at opposite sites.
That is how Virtual SAN provides resiliency against data loss if a site is offline.
We can see the resynchronization of data in progress in the Monitor tab.

[Click the Monitor tab]

28510.

[Click Resyncing Components]

28680.

Typically, there is little resynchronization required as a strteched cluster is usually deployed and configured before running workloads on it.
In this case, there are some existing workloads. 50 components are in the process of being resynchronized across sites.
We can get more details by looking at specific objects such as virtual disks.

[Click Virtual Disks]

28770.

Here we see Hard disk 1 of the app10 virtual machine. The compliance status in 'Noncompliant'.
This is because the object is being reconfigured to be available at both sites.
Both data components of this object are currently located on separate hosts at the primary/preferred site (Preferred fault domain).
A component is being created at the secondary site. Upon completion, one of the components at the primary/preferred site will be deleted.

[Click the Refresh icon]

28840.

Resynchronization of the Hard disk 1 components is complete.
There are two copies of the data component - one at each site.
The witness component has been relocated to the stretched cluster witness host.
Now let's review some of the VMs at the primary/preferred site.

[Click the Related Objects tab]

28920.

Here are the VMs running on host 05, which is located at the primary/preferred site. Hosts 03 and 04 are also at this site.
There are several app, web, and ecomm VMs on this host.

Now, we will simulate a site failure by powering off all three hosts at the primary/preferred site.

[Click the minmize button to switch to the IPMI window]

28930.

This is host 03. We will power it off using the IPMI user interface.

[Click Power Off Server - Immediate]

28950.

[Click Perform Action]

28990.

Host 03 is powered off.

Next we will switch to the IPMI for Host 04

[Click the minimize button]

29000.

We will power off Host 04

[Click Power Off Server - Immediate]

29020.

[Click Perform Action]

29050.

Host 04 is powered off. Next we will switch to the IPMI for Host 05

[Click the minimize button]

29060.

We will power off Host 05.

[Click Power Off Server - Immediate]

29080.

[Click Perform Action]

29120.

... is powered off.

All hosts at the primary/preferred site are offline.
Let's switch back to the vSphere Web Client.

[Click the minimize button]

29130.

We need to refresh the vSphere Web Client to see the current status of the cluster.

[Click the Refresh icon]

29460.

Hosts 03, 04, and 05 are clearly offline.
Virtual machines are being recovery by vSphere HA on hosts 06, 07, and 08 at the secondary site.
Let's look at the VMs restarting on host 07.

[Click prmh-a09-sm-07.eng...]

29530.

Here we see some of our app VMs now running on host 07 at the secondary site.
All of the workloads that were running at the primary/preferred site are now running at the secondary site.
vSphere DRS distributed workloads across the hosts at the secondary site using vMotion.

Let's bring our three hosts at the primary/preferred site back online.

[Click the minimize button to go to the IPMI window for Host 03]

29540.

Host 03 is currently off...

[Click Perform Action]

29560.

Now Host 3 is on. Let's power on the other hosts at this site.

[Click the minmiize button to go to the Host 04 IPMI]

29570.

Host 04 is off, we will power it on...

[Click Perform Action]

29590.

Host 4 is now on.

[Click the minimize button to go to Host 05 IPMI]

29600.

[Click Perform Action]

29620.

Host 05 is on.

Now we will go back to the vSphere Web Client to see the cluster status with all hosts online.

[Click the minimize button]

29630.

[Click the Refresh icon]

29660.

All hosts in the stretched cluster are back online.
Remember our app, ecomm, and web VMs are governed by DRS affinty rules that state those VMs
should run on hosts at the primary/preferred site. Let's see if they have been migrated back.

[Click prmh-a09-sm-03.eng...]

29890.

As expected, we see a number of the app, web, and ecomm VMs running on host 03 at the primary/preferred site.

We just configured a stretched cluster and viewed the behavior of Virtual SAN, vSphere HA, and vSphere DRS when
there is a failure at a site that takes all of the site's hosts offline. These integrated features provide significant resiliency
against site-wide unplanned downtime.

[Click Home under Navigator or the Home icon]

29910.

End of Demo

30910.

The vRealize Operations Management Pack for Storage Devices (MPSD) includes features for Virtual SAN.
We will see how vRealize Operations offers a comprehensive set of metrics to monitor Virtual SAN.
We will also use vRealize Operations to troubleshoot an issue with Virtual SAN hardware.

[Click vRealize Operations Manager]

31010.

[Click the Open vRealize Operations Manager icon]

31050.

The Cluster Insights tab naturally shows higher level items such as disk group capacity (used), throughput, and errors.
There are two Virtual SAN clusters monitored by vRealize Operations - one all-flash cluster and one hybrid cluster.
It is easy to tell which hosts belong to the hybrid cluster...

[Click the scroll bar in the Disk Group Latency graph]

31070.

... and which one are all-flash. Most of the all-flash disk groups are reporting sub-millisecond latencies.
Let's look at the other graphs on this tab.

[Click the scroll bar on the right side of the window]

31090.

Datastore throughput and capacity used are also on this tab. Each tab can be customized, as needed.
Let's switch to Device Insights

[Click the Virtual SAN Device Insights tab]

31140.

As one might expect, this tab shows specific information about devices.
Disk capacity ranked by capacity used is displayed. Disk error information is available.
This tab also shows read cache hit rates and SSD media wearout levels, which are not available in the vSphere Web Client.

[Click the Virtual SAN Entity Usage tab]

31200.

The Virtual SAN Entity Usage tab displays performance metrics (throughput and latency] at the host adapter and disk levels.
This might be useful for determining where the bottleneck is if there is a performance issue.
It is interesting that host 04 tops the Host Adapter Write Latency graph and SSD Total Latency graph.
Perhaps there is an issue with hardware in that host. Let's do some quick troubleshooting.

[Click Virtual SAN Troubleshooting]

31230.

It is good to see the Virtual SAN clusters and hosts are healthly (all green).
It appears there are issues with some of the VMs (some yellow).
More importantly, it looks like we have an problem with host 04 in the Top Issues section.

[Click Host has one or more Magnetic Disk(s) in degraded condition]

31330.

It is becoming more clear there is a hardware issue with this host.
One or more disks are in a degraded condition.
The initial recommendation is to replace the disks. Let's see if there are other recommendations.

[Click Other Recommendations]

31370.

There is evidence that the host adapters (controllers) might be the cause of the problem.
vRealize Operations can also assist in determining root cause. Let's take a closer look.

[Click > 2 out of 11...]

31400.

[Click to expand the first issue]

31420.

[Click to expand the second issue]

31480.

The expanded view shows us multiple disks could be affected.
That means it is likely there is an issue with a controller (not an individual disk) - perhaps hardware, firmware, or a driver issue.

vRealize Operations helped narrow down the cause of an issue.
We also viewed a number of performance and capacity metrics, some of which are only available in vRealize Operations.

(End of Demo)

Loading...

OOPS

OOPS

OOPS

Demo