After an upgrade to VMware vSphere 5.5 on a Dell PowerEdge R710, I’ve got strange occasion issues, where the hosts got completly disconnected from vCenter (with the ESXi/ESX host’s status as Not Responding or Disconnected in vCenter Server) and there was no way to reconnect, also after restarting the management services.
Locking in the ESXi console those kind of errors where notable: Bootbank cannot be found at path ‘/bootbank’.
The only temporally solution was power-off the VMs and restart the host. But the issue can randomly came back.
By looking on the Google one possible similar case was this: VMware ESXi 4.x and 5.x lose connectivity to Hypervisor – IBM BladeCenter HX5.
Of course this was not an IBM host, but this kind of issue is interesting and it’s related to the Permanent Device Loss (PDL) condition that appen of the boot device, if it is an Embedded USB Hypervisor. The issue can be identified with the same error message: The VMware ESXi kernel logs an error message similar to the following: “Bootbank cannot be found at path ‘/bootbank'”.
But also with the evidence, in the VMware vSphere Client or VMware vSphere Center logs, of an alert showing ‘Configuration Issue’ due to the ‘Lost connectivity to the device mpx.vmhba32:C0:T0:L0’ when ‘backing the boot file system /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0’.
In my case the real root cause came from the KB 1017297 (ESXi/ESX host appears as Not Responding in vCenter Server due to CD/DVD-ROM drive firmware issues), that explain a possible incompatibility in the CD firmware.
To found if your CD has a bad firmware:
- execute the command esxcfg-scsidevs -l,
- depending on your CD/DVD-ROM drive’s model or revision, you see output similar to:mpx.vmhba1:C0:T0:L0
Device Type: CD-ROM
Size: 0 MB
Display Name: Local TEAC CD-ROM (mpx.vmhba1:C0:T0:L0)
Plugin: NMP
Console Device: /dev/sr0
Devfs Path: /vmfs/devices/genscsi/mpx.vmhba1:C0:T0:L0
Vendor: TEAC Model: DV-28E-V Revis: C.AB
SCSI Level: 5 Is Pseudo: false Status: on
Is RDM Capable: false Is Removable: true
Is Local: true
Other Names:
vml.0005000000766d686261313a303a30
-
Upgrade the CD/DVD-ROM drive firmware to the latest version available
-
Replace the CD/DVD-ROM drive with a different model
-
Disable the CD/DVD-ROM drive within the BIOS of the ESXi/ESX host
In my case all the hardware firmware was already up-to-date, so first option was not appliable. Second was possible, but the simplest one was just disable the CD at BIOS level and use for the future the virtual CD features of the iDRAC.
Funny that the other two hosts (identical) was not affected, just because the CD were not exactly the same. The learned lesson is clear: HCL it’s always important, also on minor device and don’t assume that hosts are really identical, also if configured in the same way during the purcase.