sshAutoConnect vCenter plugin

sshAutoConnect_feat

Update 2016/05/16: Thanks to @jmauser a Github release have been created that support vSphere 6.0 Update 2. You can download it here.

Here is our first vCenter plugin ! It’s been a while we wanted to do this, here it is now :p

sshAutoConnect is a vCenter (or more likely vSphere Client) plugin.

In vSphere Client, you can open plugin window in order to view installed/activated/available plugins through “Plug-ins > Manage Plug-ins…” :

sshAutoConnect plugin will allow you to automatically connect with SSH to ESX/i directly in vSphere Client.

It makes able to manage your SSH connection straight in vSphere Client.

So we decide to develop our own plugin to add this feature.

Installation

As every vSphere client plugin, you just need to download the archive below and uncompress it in the default vSphere client plugin folder, and restart the client :

C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Plugins

Configuration

The plugin setup is made as easy as possible. The plugin is built to use sshAutoConnect.xml file in the same folder that the .dll library that can be used to specify credentials for automatic connection (optional) :

<?xml version="1.0" encoding="utf-8" ?>
<credentials>
  <default>
    <login>root</login>
    <password>d3d3LnZtZHVkZS5mcg==</password>
  </default>
  <custom_servers>
    <server name="server-esx-01.vmdude.fr">
      <login>root</login>
      <password>d3d3Lmh5cGVydmlzb3IuZnI=</password>
    </server>
    <server name="server-esx-02.vmdude.fr">
      <login>root</login>
      <password>d3d3LnZtd2FyZS5mcg==</password>
    </server>
  </custom_servers>
</credentials>

The xml file have 2 branch : <default> and <custom_servers>

The branch <default> will be used if the server you’re trying to connect doesn’t exist in <custom_servers> branch. This allows you to make some exceptions for specific servers.

If the file doesn’t exist, the plugin will not provide any credential during SSH connection, therefore you’ll have to authenticate.

Note : The passwords put in configuration file have to be encoded in Base64, you can read our previous post about encode/decode Base64 with Powershell : Base64 managing

Usage

The usage of the plugin is very easy, you just have to right clic on the ESX/i server you wanted to connect and clic on sshAutoConnect :

Download

  • Here is the .zip archive with the plugin folder containing .dll file and a sample configuration file : sshAutoConnect
  • Here are the Visual Studio 2010 source (can be built as usual with MSBuild/csc/VisualStudio) : sshAutoConnect-sources

Note : a big thanks to R0llB4ck for his help and the trick with Embedded Resource in C#

Kill Them All

During our adventures in troubleshooting world, we met an interesting case with some challenge and a lot of laziness (and that’s the kind of stuff we like :)).

homer-thinking

To summarize, we had an ESXi server whose hostd process was unresponsive, its connection state in vCenter was therefore Not Responding (of course, after a few moments, VMs that were hosted on it were restarted as usual thanks to HA on another ESX servers in the cluster, so far so good).

After logging in the ESX server console thanks to HP ILO (SSH connectivity was also down), we realized that there was an issue with the network card’s driver (a famous known problem with ELXNET driver on ESX 5.5 related to Native Driver), resulting in vmnic connectivity issue (all links were down, which explains the connection state of the ESX server).

Due to this connectivity loss, ESX server didn’t have access to NFS storage anymore (as expected), and still have running dummy VM processes (since there were no network/storage available).

The interesting case that gave us the motivation to make this post was driven by our desire to enter the ESX into maintenance mode in this state.

We wanted to enter the server in maintenance mode, but as VMM instances were still running on the ESX server, this task wasn’t possible. So we started to kill these instances thanks to the esxtop ‘k’ option with the world-id. However, ILO connection did not allow copy/paste feature, and we were on a VPN connection with a 1980-ich latency, so performing this operation on 70 processes would take too much time (especially  at 3am during on-duty call…).

minitel

As lazy as we are, we wanted to automate this process, minimizing commands to enter due to the copy/paste issue (we warned you, the key word here is ‘lazy’).

So here is a small bash script which uses esxcli vm process kill namespace. This script wil kill all VMM instance (in hard mode, so be careful as the process is stop instantly!!!) on the ESX server:

for vmid in esxcli vm process list | grep "World" | cut -f2 -d: 
do 
esxcli vm process kill --type hard --world-id $vmid
done

Once the command was executed, we were able to enter in maintenance mode and go on with our troubleshooting (and go to sleep a few hours later) 🙂

Of course you should use it at your own risk, but it could be useful!

vExpert 2016

As every year, VMware has released the list of vExpert for the year 2016 (available here with a little more than 1,350 entries, this community is growing every year ^^) and it is with honor that I accept this nomination and renews this title for the 5th year !

VMW-LOGO-vEXPERT-2016-k

five-stars

We wanted to thank you all for your support! A Big Thanks also to Corey Romero for this nomination and all the good work he puts into this vExpert program.

For information, there are 27 french vExpert (the full list is available on vmnerds.fr post), and a list of vExpert benefits (of course we want them ^^) is available on CloudManiac.net post.

SolidFire vs DelayedAck

As we follow up our previous post SolidFire vs ATS heartbeat about issues we had with SolidFire arrays, we had to disable another setting: DelayedAck

Basically, this setting is used during TCP communication between different hosts, as the receiver of a TCP frame will wait to deliver acknowledgment segment (ACK) in order to be able to group these data segment and so decreasing network load (and at the end this is a good thing ^^).

snake

The VMware KB1002598 also explains this principle:

A central precept of the TCP network protocol is that data sent through TCP be acknowledged by the recipient. According to RFC 813, “Very simply, when data arrives at the recipient, the protocol requires that it send back an acknowledgement of this data. The protocol specifies that the bytes of data are sequentially numbered, so that the recipient can acknowledge data by naming the highest numbered byte of data it has received, which also acknowledges the previous bytes.”. The TCP packet that carries the acknowledgement is known as an ACK.

A host receiving a stream of TCP data segments can increase efficiency in both the network and the hosts by sending less than one ACK acknowledgment segment per data segment received. This is known as a delayed ACK. The common practice is to send an ACK for every other full-sized data segment and not to delay the ACK for a segment by more than a specified threshold. This threshold varies between 100ms and 500ms. ESXi/ESX uses delayed ACK because of its benefits, as do most other servers.

The main issue is that some iSCSI arrays are waiting to receive an ACK segment data to send the next frame, which can generate a lot of timeouts (you can easily see this with a Wireshark dump for instance), which lead to global performance decrease.

The affected iSCSI arrays in question take a slightly different approach to handling congestion. Instead of implementing either the slow start algorithm or congestion avoidance algorithm, or both, these arrays take the very conservative approach of retransmitting only one lost data segment at a time and waiting for the host’s ACK before retransmitting the next one. This process continues until all lost data segments have been recovered.

Later in the KB article, you can find some clarification about impact of DelayedAck over VMFS heartbeat:

Most notably, the VMFS heartbeat experiences a large volume of timeouts because VMFS uses a short timeout value

Compared to the array behavior, SolidFire support has asked us to disable this setting. This could be changed at any iSCSI chain level, i.e. at the adapter, target and/or devices level with inheritance of the value (which can be convenience to set it globally). As SolidFire support asked to disable it permanently we choose to set it at the adapter level:

Advanced_Settings_2015-07-28_17-43-05

Configuration ESXi > Storage Adapters > iSCSI Adapter > Properties > General Tab > Advanced

In order to display this value for all your ESXi servers, here is a quick OneLiner:

Get-View -ViewType HostSystem -Property Name, Config.StorageDevice.HostBusAdapter | Select Name, @{Name="DelayedAck"; Expression={($_.Config.StorageDevice.HostBusAdapter.AdvancedOptions | ?{$_.key -eq "DelayedAck"}).value}}

Finally, in order to disable this setting on all ESXi from your platform, we made another OneLiner (which you can run as many times as you want as it checks if value need to be updated):

Get-View -ViewType HostSystem -Property Config.StorageDevice.HostBusAdapter, ConfigManager.StorageSystem -SearchRoot (Get-View -ViewType ClusterComputeResource -Filter @{"Name" = "cluster name"}).MoRef | ?{ ($_.Config.StorageDevice.HostBusAdapter.AdvancedOptions | ?{$_.key -eq "DelayedAck"}).value} | %{ (Get-View ($_.ConfigManager.StorageSystem) ).UpdateInternetScsiAdvancedOptions( ($_.config.storagedevice.HostBusAdapter | ?{$_.Model -match "iSCSI Software"}).device, $null, (New-Object VMware.Vim.HostInternetScsiHbaParamValue -Property @{Key = "DelayedAck"; Value = $false}))}
Page 1 sur 191234510Dernière page »