NetBackup+InfiniteLoop+Snapshot=Bad

Update 2014/01/29: Raphaël published on the OneLiner depot a command to list all VM without snapshot and with delta VMDK (hidden snapshot)

We came across an interesting issue recently with one ESXi 5.0 server being disconnected from vCenter:

Although ESXi server was being disconnected from vCenter, there was no issue with virtual machines hosted on it, hopefully the issue wasn’t on some Hyper-V ParentVM, VM state would have been different :p

Anyway, vCheck report showed a lot of weird errors regarding the troublesome server:

Cannot synchronize host XXX. A general system error occurred: Unexpected exception reading HTTP response body: class Vmacore::Http::TruncatedResponseException(While determining chunk size, truncated HTTP response.) with trace: backtrace[00] rip 000000018013deba (no symbol) backtrace[01] rip 0000000180101518 (no symbol) backtrace[02] rip 0000000180101a5e (no symbol) backtrace[03] rip 000000018008930b (no symbol) backtrace[04] rip 000000018003fd5f (no symbol) backtrace[05] rip 00000001800444ce (no symbol) backtrace[06] rip 00000000005b27e9 (no symbol) backtrace[07] rip 0000000000583b87 (no symbol) backtrace[08]
 ...

And any attempt to reconnect the ESXi server in the vCenter had fail with the same errors:

We also had a lot of errors in log files like:

column 3836417 while parsing property "parent" of static type VirtualDiskFlatVer2BackingInfo while parsing serialized DataObject of type vim.vm.device.VirtualDisk.FlatVer2BackingInfo at line 7

As this error made reference to VirtualDiskFlatVer2BackingInfo, we thought the issue can be related to some VMDK issue of one or more virtual machine. So we start to look into some VM’s folder in order to take a closer look and and we found some VM with no less than 230 snapshots (but vSphere console showed no one) …

Finally, the issue seemed to be related again to NetBackup agent that, following some SAN issue, created endless snapshots files for backup and didn’t cleaned them up well (nice loop) …

In order to be able to reconnect the failed server to vCenter, we had to connect directly to ESXi server and run a Consolidate task for VM disk with this issue. After 5 very long hours waiting for consolidate task to be completed (we actually have been quite surprised guest hasn’t been more impacted), virtual machine retrieved regular disks without snapshot and we’re able to reconnect ESXi server with no issue anymore.

In the end, Erik Bussink quite well conclude with his answer on twitter:

Yes we’re waiting for Veeam migration ;p

Hypervisor.fr have updated the OneLiner depot in order to add one that’ll list all VM with hidden snapshot (#OL26 available here http://www.hypervisor.fr/?page_id=3637) :

Get-View -ViewType VirtualMachine|?{!$_.snapshot}|?{$_|%{$_.layoutex.file}|?{$_.name -match "-delta.vmdk$"}}|select name

Leave a Reply

Required fields are marked *.