March 2015
M	T	W	T	F	S	S
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

VMware vSphere 6 – Availability

2 March 2015 Andrea Mauro

This post is also available in: Italian

Reading Time: 3 minutes

With the new VMware vSphere 6.0 there are several improvement in the availability related features. Although the vSphere HA apparently has not changed so much (of course now support a bigger cluster with 64 nodes), there several aspects that have been improved or changed:

New MSCS capabilities
New vSphere VM Component Protection (VMCP)
Network partition manament
New VMware FT-SMP

For more information see also:

MSCS capabilities

Starting with vSphere 5.5 there were several improvement in MSCS capability described in KB 2052238 (MSCS support enhancements in vSphere 5.5).

The three main type of guest cluster supported remain:

Clustering MSCS Virtual Machines on a Single Host (CIB)
Clustering MSCS Virtual Machines Across Physical Hosts (CAB)
Clustering Physical Machines with MSCS Virtual Machines (N+1)

But now there is also support for Windows 2012 R2 and SQL 2012, both in Failover Clustering and AlwaysOn Availability Groups configuration.

Note that using clustering across physical hosts (CAB) with Physical Compatibility Mode RDM’s, the supported OSes are: Windows Server 2008, 2008 R2, 2012 and 2012 R2.

Seems also that vCenter Server could be run on a FailOver Cluster (of course only the Windows version), but this must be confirmed in the final documentation.

VMCP

The new vSphere VM Component Protection (VMCP) is a way to protect VMs against storage connectivity failures and misconfigurations and was a big limit of previous version of VMware HA that was only able to detect network failure for hosts (and only on some networks).

With vSphere there are mainly two different cases of storage failure:

All Paths Down (APD): represents a transient or unknown accessibility loss or any other unidentified delay in I/O processing. This type of accessibility issue is recoverable.
Permanent Device Loss (PDL): is an unrecoverable loss of accessibility that occurs when a storage device reports the datastore is no longer accessible by the host. This condition cannot be reverted without powering off virtual machines.

Starting with vSphere 5.5 there where some big improvement on how handle APD and PDL, but still a host storage failure was a big issue. The simple solution was design the storage to don’t fail.

VMware FT can now protect a VM also from storage failure, but finally it’s possible handle this issue also with vSphere HA for all VMs and all datastores with full customization of responses:

APD: Terminate VM after user-configured timeout only if there is enough capacity; restart on a healthy host. Reset a VM if APD clears after APD timeout
PDL: Terminate VM immediately; restart on a healthy host

There are detailed reporting of conditions and actions taken included impacted VMs, host(s) and datastore(s).
For an example on how this feature works see this video:

Network partition

When a management network failure occurs for a vSphere HA cluster, a subset of the cluster’s hosts might be unable to communicate over the management network with the other hosts. Multiple partitions can occur in a cluster. A partitioned cluster leads to degraded virtual machine protection and cluster management functionality.

Depending on where is vCenter Server and which hosts can be reach there are some cases explained in the Availability Guide. Also vSphere HA uses datastore heartbeating to distinguish between partitioned, isolated, and failed hosts.

VMware FT

For more information see the related post.

Andrea Mauro

Virtualization, Cloud and Storage Architect. Tech Field delegate. VMUG IT Co-Founder and board member. VMware VMTN Moderator and vExpert 2010-24. Dell TechCenter Rockstar 2014-15. Microsoft MVP 2014-16. Veeam Vanguard 2015-23. Nutanix NTC 2014-20. Several certifications including: VCDX-DCV, VCP-DCV/DT/Cloud, VCAP-DCA/DCD/CIA/CID/DTA/DTD, MCSA, MCSE, MCITP, CCA, NPP.

VMware, vSphere none

#1 | Written by nikhil gupta about 9 years ago.

VSAN 6.0 story is even stronger due to comprehensive operations management capability via VMware’s vRealization Operations Management Pack for storage devices

http://blogs.vmware.com/management/2015/02/vsan-simplifying-sddc-storage-operations-with-vrealize-operations-management-pack-for-storage-devices.html
#2 | Written by Marc Crawford about 9 years ago.

Great post!
#3 | Written by Yuri Mendoza about 9 years ago.

It`s a great improvement, especially the detection of storage failures. And as nikil gupta pointed out, with the help of vRealize Operations you can be proactive with the different failures scenarios.

Thanks for the post.
- #4 | Written by Mike M about 9 years ago.
  
  Cool article!
#5 | Written by Zsolt Pamuki about 9 years ago.

Thanks for this article.
#6 | Written by ScoZel about 9 years ago.

Good information, especially VMCP!
#7 | Written by Alex about 9 years ago.

Good info. Thanks!
#8 | Written by Mohamed Ibrahem about 9 years ago.

Good article, Thanks
#9 | Written by Mohamed Ibrahem about 9 years ago.

Good article, thank you
#10 | Written by Cris Rodriguez about 9 years ago.

Thanks for the information good to know about how APD and PDL behave.
#11 | Written by David Cain about 9 years ago.

Good article, I hadn’t read about VMCP yet. Many thanks.
- #12 | Written by Andrea Mauro about 9 years ago.
  
  This is a great features
#13 | Written by Paulo Reis about 9 years ago.

I liked it.

Using this new features, we can improve availability.
It is a good post.
#14 | Written by Scott Clark about 9 years ago.

Looking forward to checking out VMCP.
#15 | Written by Nicholas Korte about 9 years ago.

Thanks for the article!
#16 | Written by Andres Rojas about 9 years ago.

Looks like 6 is better now in that aspect, looking forward to test failure scnarios in our lab
#17 | Written by Looneyduk about 9 years ago.

nice post – thanks
#18 | Written by Syed Fayyaz Hussain Rizvi about 9 years ago.

Good Post
Thanks Andrea
#19 | Written by Alessandro about 9 years ago.

Good news:-)
#20 | Written by Rahim about 9 years ago.

Great Article, haven’t read about VMCP before..
- #21 | Written by Andrea Mauro about 9 years ago.
  
  It’s a great new feature
#22 | Written by vervoort jurgen about 9 years ago.

great features
#23 | Written by Amit about 9 years ago.

Nice Post !!!!
#24 | Written by Ravi Venkatasubbaiah about 9 years ago.

Good analysis of vSphere 6 Availability features
#25 | Written by Mohammed Sadiq about 9 years ago.

Good Feature .
#26 | Written by Srujan about 9 years ago.

Nice article with good information. Thanks.
#27 | Written by Satish Kumar B about 8 years ago.

Faster vmotion and HA.