21 November 2012 / #Powercli #Sdk #Vsphere

vCenter HTTP503 Service Unavailable error

As we were working on a vCenter diff files plugin for *.vmx and *.vmdk (only for the descriptor, not the full -flat) in order to easily track changes made on theses files (as we wanted to know precisely what’s going on under the Reconfigure Virtual Machine tasks), we encountered some issues regarding vCenter Web Services.

Aside, just for the fun, here is a little teaser of the unfinished plugin:


The issue we had started as we were running the script that get the *.vmx and *.vmdk files. The script started to get the files just well, but at some point (as we have a lot of VM) some errors showed up like this one:


In the same time, we had other problems, like vCenter MOB being unavailable or being unable to get VM console:



After a quick search in VMware KB, we found the KB 2033822 : vCenter Server returns 503 Service Unavailable errors

Here is an excerpt of the KB that explain what was happening:

The vpxd log files contain entries that indicate that a socket connection attempt failed because it timed out. If you run netstat -an on the vCenter Server host machine immediately after the error, you will see many connections where one end is port 8085 on the loopback and the other end is another port on the loopback. Some of these connections will be in the TIME_WAIT state.

vCenter Server uses TCP connections on the loopback (localhost) for Remote Procedure Calls (RPC) to dispatch client requests and to communicate with vCenter Server companion services. As a result, under heavy loads, vCenter Server creates many local TCP connections, then closes them and opens new ones. Some of the closed connections remain open at the server side in theTIME_WAIT state for some time (four minutes with default Windows settings). Because the number of client-side ports is limited, if vCenter Server uses the connections fast enough, at some point the client side tries to reuse a port while the server side still has a connection for this client port in the TIME_WAIT state.

As we checked the vCenter server for TIME_WAIT connections on 8085th port (with the command netstat -an | findstr 8085.*TIME_WAIT), we saw that there was a lot of them:


The KB 1030246 : Port 8085 in VMware vCenter Server give some explanation about port 8085 in vCenter:

This means that 8085 is the port where all the SDK connections to vCenter Server are being made, which in turn means that vSphere Client and any scripts built on vCenter Server SDK use this port.

And we can find the same alert as in the previous KB about heavy load:

If there are a lot of scripts or applications making connections with vCenter Server, it is possible to see a large number of ports in a TIME_WAIT state. This is normal because Windows keeps a socket in TIME_WAIT state for certain period of time (Twice Maximum Segment Lifetime, so this wait could be 4 minutes) before recycling it back for use.

As we have around 2000VM in this vCenter, and as we automate everything we can, we  had reach the limit for SDK connections ports. The KB gives the workaround in order to change that limit:

By default, vCenter Server has 3976 ephemeral ports. If you are running out, you can increase the limit.

To allow more local ports to be available:

  • Open Registry Editor (Regedt32.exe).
  • Locate this key in the registry:
  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
  • Right-click Parameters and choose New DWORD Value.
  • Enter MaxUserPort in the Name data box, enter 65534 (Decimal) in the Value data box, then click OK.
  • Note: The default setting for the MaxUserPort value is 5000 (Decimal).The maximum value for Windows Server 2003 is 65534 (Decimal).
  • Close Registry Editor.
  • Restart the machine for the new setting to take effect.

If you want to fix this HTTP503 error without changing this value, you can wait for the 4 minutes delay in order to let Windows cleaning up the TIME_WAIT remaining connections, or you can restart the Windows service VMware VirtualCenter Management Webservices (for the most hurried ones).

Finally, we were able to change this settings in order to keep using massive PowerCLI scripts OneLiner ^^

> Frederic MARTIN