Now that the PSOD on vSphere 6.5 and 10 Gbps NICs issue is finally solved seems that vSphere 6.5 critical bugs are closed, but it’s not totally true.
During an upgrade from a vSphere 6.0, I’ve found a really strange iSCSI storage issues where all the VMs on the iSCSI datastore were so slow to become un-usable. First I was thinking about drivers or firmware, in the hosts and in the NIC (1 Gbps) or the firmware on the storage.
But also after updating all the issue was still there, without a clear signal or clear error, expept those messagges in the vmkernel log file (and on ALT+F12 console of ESXi):
The storage was an Dell EqualLogic with the MEM multipath module. But also removing the vendor multi-path module, there was still some period messages about the path removing (note also the LUN ID 257 related to Virtual Volumes, but wasn’t a Virtual Volumes related issues).
Finally a good hint was from a different host with same ESXi version that was working perfectly! So the root cause was for sure on the different NIC types and model!
At this point was really easy found the right VMware KB, by looking on the the network card driver with the command:
esxcfg-nics -l
The driver was the ntg3, a new branch of the tg3 driver with version 4.1.3.0-1vmw.650.1.36.7388607… so a native driver and specific for 6.5 version… but with some know issues!
The VMware KB 2150889 (Network becomes unavailable with ntg3 driver on ESXi 6.5) describe this issue or part of it. Was not exacly the same and the network was totally working but with very poor performance and latency!
To resolve this issue, use any one of these options:
Option 1:Run the ntg3 driver in legacy mode
- To run the ntg3 driver in legacy mode, run the following command on host:# esxcli system module parameters set -m ntg3 -p intrMode=0
- Reboot the host.
Option 2: run the old tg3 driver
- Use the tg3 vmklinux driver as the default driver, instead of the native ntg3 driver by running below commands:esxcli system module set –enabled=false –module=ntg3
esxcli system module set –enabled=true –module=tg3 - Reboot the host.
I simple choose the second, to use a more conservative driver (in my case the OEM driver 3.137l.v60.1-1OEM.600.0.0.2494585 from Dell) and the system was finally upgrade without any performance issue.
So be aware if you have ntg3 driver on vSphere 6.5… just avoid it and use the old one.
Of course this driver is for 1 Gbps NICs, but some customers still have this kind of configuration and for small/medium environment could work good without the need of 10 Gbps cards (and switches!).