After a while I thought of writing this post as I was able to fix an issue with one of the most critical VMs in one of our main datacenters. First of all I will give you the background of the issue and let’s see how we managed to fix the issue.

As a disaster recovery option we are taking VM level backups of the VMs and we were facing lots of issues with this backup solution.  I would not say this is a product related issue and can be an architectural issue of the backup solution design. At the time of writing this article we are trying find the root cause of all these backup issues.
We are taking Virtual Machine level backups every day and there were lots of delta disks with VMs and we were experiencing performance issues from the Operating System layer. Also, we were getting consolidation warnings for these VMs. The subjected Virtual Machine was one of the  most Critical File server and we got a “Consolidation needed” warning message.
Unfortunately, someone tried to consolidate the VM and it didn’t end well. One of the old backup processes was holding the VM and we had to restart the entire ESXi host to bring the VM up and running.

Once we check the Virtual Machine disks files there were more than 80 delta disks for each and every Hard Disks attached to the VMs. One more thing there were four 1 TB disks attached to the VM and 1TB Dynamic Disk was created in the OS.

Our higher management and business owners wanted take a full file level backup before proceeding with any steps with this VM as the files inside this server were really important to the business. Ok, so we started to get the file level backups and it was failing due to the non-responsiveness of the VM in the middle of the backup job.
That was the last resort and everybody was seeking the status of this VM. We wanted to find a solution and wanted to bring these server up and running with the best performance as it was failing and the business impact was getting high.

So we decided to clone the VMDKs from the latest delta disk to the last base disk. We used vmkfstools -i to clone the disk to a separate datastore and it took almost 20 hours to complete the disk clone.

We used “vmkfstools -i /vmfs/volumes/” command and it consolidate the vmdk files and clone it to a single vmdk on the fly of the task.

Once we cloned all the disks to a single Hard Disks we created a new Virtual Machine with the same specifications and mapped the disks to the same SCSI nodes to avoid any issues of the OS Dynamic disk.

After configuring and powering on the VM we had to setup the Network adapter settings in the OS. We have performed the Performance testings and Services testings after all these configurations, I’m happy to say that VM is running with the great performance.

Leave a Reply

Your email address will not be published. Required fields are marked *