Erreur vCenter HTTP503 Service Unavailable

Alors que l’on travaillait sur un petit plugin pour le vCenter d’historisation des fichiers *.vmx et *.vmdk (juste le descripteur et non pas le -flat) afin de pouvoir tracer facilement les modifications/évolutions de ces fichiers (afin de savoir précisément ce qui se cache derrière les « Reconfigure Virtual Machine »), on a rencontré un petit problème avec les Web Services du vCenter.

En aparté, juste pour le fun, voici un petit teaser du plugin pas encore fini :

Le problème que l’on a rencontré est apparu alors que l’on faisait tourner le programme de récupération de fichiers *.vmx et *.vmdk. Le programme a commencé à récupérer correctement les fichiers, mais au bout d’un certain moment (compte tenu du grand nombre de VM), des erreurs sont apparues, du style :

Au même moment, d’autres problèmes sont apparus, MOB vCenter inaccessible, impossible de prendre la main en console sur une VM :

Après une recherche rapide dans les KB de VMware, on tombe sur la KB 2033822 : vCenter Server returns « 503 Service Unavailable » errors

Voici l’extrait de la KB qui nous concerne :

The vpxd log files contain entries that indicate that a socket connection attempt failed because it timed out. If you run netstat -an on the vCenter Server host machine immediately after the error, you will see many connections where one end is port 8085 on the loopback and the other end is another port on the loopback. Some of these connections will be in the TIME_WAIT state.

vCenter Server uses TCP connections on the loopback (localhost) for Remote Procedure Calls (RPC) to dispatch client requests and to communicate with vCenter Server companion services. As a result, under heavy loads, vCenter Server creates many local TCP connections, then closes them and opens new ones. Some of the closed connections remain open at the server side in theTIME_WAIT state for some time (four minutes with default Windows settings). Because the number of client-side ports is limited, if vCenter Server uses the connections fast enough, at some point the client side tries to reuse a port while the server side still has a connection for this client port in the TIME_WAIT state.

Rapidement, on vérifie sur le serveur vCenter les connexions en TIME_WAIT sur le port 8085 (via la commande netstat -an | findstr « 8085.*TIME_WAIT ») pour se rendre compte qu’effectivement, il y en avait beaucoup :

La KB 1030246 : Port 8085 in VMware vCenter Server nous en apprend un peu plus sur l’utilisation réelle du port 8085 par vCenter :

This means that 8085 is the port where all the SDK connections to vCenter Server are being made, which in turn means that vSphere Client and any scripts built on vCenter Server SDK use this port.

Et on retrouve la même alerte que dans la précédente KB liée à une charge importante :

If there are a lot of scripts or applications making connections with vCenter Server, it is possible to see a large number of ports in a TIME_WAIT state. This is normal because Windows keeps a socket in TIME_WAIT state for certain period of time (Twice Maximum Segment Lifetime, so this wait could be 4 minutes) before recycling it back for use.

Dans notre cas, vu que l’infrastructure est relativement importante (~2000 VM), et que l’on automatise au maximum, on arrive rapidement à saturation des ports disponibles pour les connexions SDK. La KB nous détaille la procédure afin de modifier cette limite :

By default, vCenter Server has 3976 ephemeral ports. If you are running out, you can increase the limit.

To allow more local ports to be available:

  • Open Registry Editor (Regedt32.exe).
  • Locate this key in the registry:
  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
  • Right-click Parameters and choose New DWORD Value.
  • Enter MaxUserPort in the Name data box, enter 65534 (Decimal) in the Value data box, then click OK.
  • Note: The default setting for the MaxUserPort value is 5000 (Decimal).The maximum value for Windows Server 2003 is 65534 (Decimal).
  • Close Registry Editor.
  • Restart the machine for the new setting to take effect.

Afin de régler le problème si vous ne voulez pas modifier cette valeur, vous pouvez attendre les 4 minutes afin que Windows purgent automatiquement les connexions restantes en TIME_WAIT, ou bien relancer le service Windows VMware VirtualCenter Management Webservices (pour les plus pressés).

Moralité, on a pu modifier ce paramètre pour continuer à utiliser les scripts OneLiner PowerCLI en masse ^^

2 comments

    • Merci :p

      Pour l’instant c’est fonctionnel sur mon PC, je suis en train de bosser pour le rendre plus « présentable » et notement le proposer aussi pour le WebClient (il faut bien y passer maintenant …)

Laisser un commentaire

Required fields are marked *.


vCenter HTTP503 Service Unavailable error

As we were working on a vCenter diff files plugin for *.vmx and *.vmdk (only for the descriptor, not the full -flat) in order to easily track changes made on theses files (as we wanted to know precisely what’s going on under the « Reconfigure Virtual Machine » tasks), we encountered some issues regarding vCenter Web Services.

Aside, just for the fun, here is a little teaser of the unfinished plugin:

The issue we had started as we were running the script that get the *.vmx and *.vmdk files. The script started to get the files just well, but at some point (as we have a lot of VM) some errors showed up like this one:

In the same time, we had other problems, like vCenter MOB being unavailable or being unable to get VM console:

After a quick search in VMware KB, we found the KB 2033822 : vCenter Server returns « 503 Service Unavailable » errors

Here is an excerpt of the KB that explain what was happening:

The vpxd log files contain entries that indicate that a socket connection attempt failed because it timed out. If you run netstat -an on the vCenter Server host machine immediately after the error, you will see many connections where one end is port 8085 on the loopback and the other end is another port on the loopback. Some of these connections will be in the TIME_WAIT state.

vCenter Server uses TCP connections on the loopback (localhost) for Remote Procedure Calls (RPC) to dispatch client requests and to communicate with vCenter Server companion services. As a result, under heavy loads, vCenter Server creates many local TCP connections, then closes them and opens new ones. Some of the closed connections remain open at the server side in theTIME_WAIT state for some time (four minutes with default Windows settings). Because the number of client-side ports is limited, if vCenter Server uses the connections fast enough, at some point the client side tries to reuse a port while the server side still has a connection for this client port in the TIME_WAIT state.

As we checked the vCenter server for TIME_WAIT connections on 8085th port (with the command netstat -an | findstr « 8085.*TIME_WAIT »), we saw that there was a lot of them:

The KB 1030246 : Port 8085 in VMware vCenter Server give some explanation about port 8085 in vCenter:

This means that 8085 is the port where all the SDK connections to vCenter Server are being made, which in turn means that vSphere Client and any scripts built on vCenter Server SDK use this port.

And we can find the same alert as in the previous KB about heavy load:

If there are a lot of scripts or applications making connections with vCenter Server, it is possible to see a large number of ports in a TIME_WAIT state. This is normal because Windows keeps a socket in TIME_WAIT state for certain period of time (Twice Maximum Segment Lifetime, so this wait could be 4 minutes) before recycling it back for use.

As we have around 2000VM in this vCenter, and as we automate everything we can, we  had reach the limit for SDK connections ports. The KB gives the workaround in order to change that limit:

By default, vCenter Server has 3976 ephemeral ports. If you are running out, you can increase the limit.

To allow more local ports to be available:

  • Open Registry Editor (Regedt32.exe).
  • Locate this key in the registry:
  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
  • Right-click Parameters and choose New DWORD Value.
  • Enter MaxUserPort in the Name data box, enter 65534 (Decimal) in the Value data box, then click OK.
  • Note: The default setting for the MaxUserPort value is 5000 (Decimal).The maximum value for Windows Server 2003 is 65534 (Decimal).
  • Close Registry Editor.
  • Restart the machine for the new setting to take effect.

If you want to fix this HTTP503 error without changing this value, you can wait for the 4 minutes delay in order to let Windows cleaning up the TIME_WAIT remaining connections, or you can restart the Windows service VMware VirtualCenter Management Webservices (for the most hurried ones).

Finally, we were able to change this settings in order to keep using massive PowerCLI scripts OneLiner ^^

Laisser un commentaire

Required fields are marked *.