ESXi PSOD due to a PCPU becomes too busy

One of the ESXi hosts failed with a “Purple Screen Of Death” and below analysis found as the root cause of the failure.

It was sitting in a vSphere 5.5 and lower patch level version to 30xxxxx. We were not able to identify any hardware failures or any error related to the server hardware. Also I can confirm that it was configured with the correct drivers.

This is the part of an error logs we found in the failed ESXi host

2017-09-12T05:25:54.232Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:54.433Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:54.633Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:54.832Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:55.034Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:55.235Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:55.434Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:55.634Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:55.833Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:56.032Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:56.231Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:56.429Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:56.628Z cpu37:66166510)MCE: 1118: cpu37: MCA error detected via CMCI (Gbl status=0x0): Restart IP: invalid, Error IP: invalid, MCE in progress: no.
2017-09-12T05:25:56.629Z cpu37:66166510)MCE: 222: cpu37: bank7: status=0xcc000f4000010091: (VAL=1, OVFLW=1, UC=0, EN=0, PCC=0, S=0, AR=0), ECC=no, Addr:0x526e3600 (valid), Misc:0x390261e840 (valid)

 

This was identified as the root cause: PCPU becomes too busy logging all the correctable error messages to perform routine background tasks, leading ESXi to assume that PCPU is unresponsive.

Possible tasks to correct the Error: To fix this PSOD error we had to update the 5.5 Patch version to 3568722, however the latest patch version available to 5.5 is 5230635.

You can read More about this in below KB articles: 

 

How to Patch vCenter 6.5 Appliance – From CD-ROM

In this post, I’m going to show you how we can patch the VMware vCenter 6.5 Appliance. If you are concerning about the bug fixes improvements of your vCenter, Appliance patching is playing a great role.

I believe you are now familiar with the vCenter 6.5 which came with lots of new features and improvements. So, less talk and let’s get started..
First of all make sure to take the relevant backups and a snapshot of the vCenter before the Patch Update to avoid any unexpected situations after the upgrade, that’s the best practice before any sort of patch or version upgrade.
Mainly there are three ways that you can patch your vCenter Appliance. You can check for the updates and patches from the online repository or you can place your own Web Server in your environment and create your own repository to patch the vCenter Server. If you are going to use your own web server to present the patch content you can download the Patch and update bundles in the
VMware Download Center. You need to use your VMware login credentials to download the zip files.
In this article, I’m going to show you the steps to perform the full patch using your CD-ROM. Download the relevant Patches from the VMware Patch Download Center and mount it to your vCenter Appliance.

Read More