VMware vSAN 6.6 works only in unicast mode, if you have upgrade all the disks to the last v5 format.
But recently I’ve got a new cluster, build totally from scratch with latest version, that has switched to multicast mode, with the result of all hosts partitioned at network level.
The issue has happen just after I’ve force vCSA to use proxy for the upgrade and after the restart of vCenter (post-update).
This is a well know issue affecting vSAN 6.6. Currently, with actually there is no resolution (at this time).
The behaview is described in KB 2150523 (vSAN Cluster Incorrectly Displays Network Mode as Multicast) and the only options are:
- Remove the http/https proxy environment variable in vCenter Server.
OR - Exclude local hosts from the proxy.
- Open an SSH session to the vCenter Appliance that is managing the vSAN 6.6 cluster.
- Edit the /etc/sysconfig/proxy file to contain the FQDN of vCenter:
NO_PROXY=”localhost, 127.0.0.1, vcenter.domain.local” - Restart the vSAN management service
- In the vSphere Web Client, browse to:
Administration > Deployment > System Configuration > Services > vSAN Health Service > Actions > Restart). - From the VCSA CLI, run this command: /usr/lib/vmware-vmon/vmon-cli -r vsan-health
- In the vSphere Web Client, browse to:
Unfortunately this was not enough to fix the issue.
The only way to fix the network partition issue and switch back to unicast mode was manually register all the nodes as described in this great post:
http://virtuallysensei.com/troubleshooting-steps-unicastagent/
Basically you have to populate the list of all the nodes partecipating on the cluster, on all the nodes of the cluster:
[root@esxi-ing2:~] esxcli vsan cluster unicastagent list NodeUuid IsWitness Supports Unicast IP Address Port Iface Name ------------------------------------ --------- ---------------- -------------- ----- ---------- 59d22edb-cf3c-a02d-870e-f4e9d4a050a0 0 true 192.168.10.180 12321 vmk2 59d25308-7f3d-a545-dff8-f4e9d4a0ea00 0 true 192.168.10.181 12321 vmk2 59d254ae-9962-310c-59c7-f4e9d4a0cd70 0 true 192.168.10.182 12321 vmk2 59d2559b-ba89-17ab-9e6d-f4e9d4a12dc0 0 true 192.168.10.183 12321 vmk2 59d25772-d855-512d-11d8-f4e9d4a10f40 0 true 192.168.10.184 12321 vmk2 59d25a50-af7c-1140-1b2d-f4e9d4a0df60 0 true 192.168.10.185 12321 vmk2
In this example, vmk2 is the vmkernel adapter for the vSAN network that is 192.168.10.x.
Still is not enought, you have to disable vCenter authority on vSAN configuration before make the changes, using this command (on all vSAN hosts):
esxcfg-advcfg -s1 /VSAN/IgnoreClusterMemberListUpdates
Then you have to register all the node, restart the vCenter Server (restart the service is not enought).
Then re-enable vCenter authority on vSAN configuration with this command (still on all hosts):
esxcfg-advcfg -s0 /VSAN/IgnoreClusterMemberListUpdates
And fix in vCenter the configuration, by checking on the healt tab.