Reading Time: 3 minutes

Seems that there are still some issues with vSphere 6.5, with a possible PSOD (Purple Screen Of the Death) after upgrade to 6.5U1 on ESXi hosts using 10 Gbps NICs.

The VMware KB 2151749 describe this issue and explains that this occurs because Netqueue commit phase abruptly stop due to the failure of hardware activation of a Rx queue. As a result, Internal data-structure of the Netqueue layer’s could go out of sync with the device and cause PSOD.

Funny does not explain if all 10 Gbps cards are affected, or if all type of device drivers are affected, or which kind of upgrade path, because I’ve upgrade to 6.5U1 without this issues also on some 10Gbps configuration.

And it’s strange that a clean installation does not present this issue: if it is a problem why does not apply also to a clean installation? In the past, I’ve got a similar problem with a device driver issue post upgrade to 6.0, but after a hard reset the issue as gone.

Will be interesting see more information about the root cause and when it really apply and I’m curious to see if it’s related to the two main type of devices drivers, native and not native, in this case, the choice of moving only to native drivers in the next version will be cleary the right choice.

Currently, there is no resolution to this issue, and VMware gives the only workaround this issue, downgrade ESXi to 6.0 U2 (not a great option anyway, for who has planned to upgrade).

Veeam, one of the first vendor to found this issues (from their customers), reports that the issues is due to network-intensive activities such as backup over NBD or vMotion randomly triggering one. But seems that this bug can be triggerer not only from high network load, but also from creating vmkernel ports.

Latest reply from VMware was:

“Performing vMotion or network related activity with vMotion/vmk add and remove tasks causing host PSOD due to a race condition is a known issue reported internally. As per Engineering team’s update this is fixed in the upcoming version 6.5 Patch02 which is tentatively yet to be released in the last week of November or 1st week of December 2017”

Anyway, in my opinion, vSphere 6.5 remain much solid compared to vSphere 6.0 and in the same period (the first year of life) has got fewer issues.

PS: the bug has been finally addressed on Dec 2017, for more information see PSOD on vSphere 6.5 and 10 Gbps NICs: solved!

Share