Viva la HashTable

We always tried to fully optimize scripts and applications that we developed because as soon as the infrastructure is quite large, the gain will not be insignificant (as we have seen previously with vCheck optimizations).

While we were looking to reduce execution time of some script (or should I say OneLiner), we found a new way for script enhancement, in the same spirit as NoSQL, based on a key-value association rather than a complex relational schema.

This is done using hash tables that will be populated at the beginning of the script and be used afterward. The hash table structure is very useful for search operations, because the time consumption is very low, with an average of O(1) in BigO notation (aka no time wasted):

For instance, we had a relatively simple OneLiner which listed virtual machines that were not in a resource pool (ie with ‘Resources’ resource pool as parent, you can find more information on page 43 of VMware vSphere Resource Management document) and their cluster:

get-view -ViewType virtualmachine -Property ResourcePool, Name | ?{(Get-View $_.ResourcePool -Property Name).Name -eq "Resources"} | Select @{n="Cluster";e={(get-view (Get-View $_.ResourcePool -Property Parent).Parent -Property Name).Name}}, Name

This OneLiner filters already all Get-View calls to minimize execution time but is still taking a lot of time…

So we tried to look into hash tables in order to see what could be done with it. We created several hash tables in order to get rid of Get-View calls (except of course for the first one retrieving virtual machines).

$htabResourcePool = @{}
$htabResourcePoolParent = @{}
$htabCluster = @{}

Once hash tables were declared, we filled them up with needed data (using MoRef as key) :

get-view -viewtype resourcepool -property name -Filter @{"name"="Resources"} | %{$htabResourcePool.Add($_.MoRef,$_.Name)}
get-view -viewtype resourcepool -property parent -Filter @{"name"="Resources"} | %{$htabResourcePoolParent.Add($_.MoRef,$_.Parent)}
get-view -viewtype clustercomputeresource -property name | %{$htabCluster.Add($_.MoRef,$_.Name)}

Finally we use these tables in the OneLiner, replacing Get-View calls by hash tables search operations:

get-view -ViewType virtualmachine -Property ResourcePool, Name | ?{$htabResourcePool[$_.ResourcePool] -eq "Resources"} | Select @{n="Cluster";e={$htabCluster[$htabResourcePoolParent[$_.ResourcePool]]}}, Name

For now, we reach the same results as the method without hash tables but now we have to look into the execution time :p

So we created a sample script which will run process using both methods:

  1. the first method will use Get-View calls only
  2. the second method will populate hash tables and will use them afterward (the hash table filling will be part of the process to fit the exact same perimeter).

The script will display the two methods’ results’ count (to ensure the similarity of the returned content) also with the duration. In our example, we had a little less than 2200 VMs on the platform and here are the results:

In order to not mess with hypervisor.fr (we’re already hearing him shouting “Don’t touch my OneLiner!”, the chosen example is actually very meaningful because of the Get-View calls overlaping so the transition to hash tables were very effective :p

It is possible though that depending on the script used, the gain of using hash tables will not be as obvious as it were in this example. However, we invite you to test and compare the execution time.

Here is the script used to perform benchmark using both methods:

Write-Host -ForegroundColor Yellow "Benchmarking starting..."

Write-Host "`nMethod 1 (with regular filtered Get-View)"
$startMethod1 = Get-Date
$resultMethod1 = get-view -ViewType virtualmachine -Property ResourcePool, Name | ?{(Get-View $_.ResourcePool -Property Name).Name -eq "Resources"} | Select @{n="Cluster";e={(get-view (Get-View $_.ResourcePool -Property Parent).Parent -Property Name).Name}}, Name
$endMethod1 = Get-Date

Write-Host -ForegroundColor Green "Found"(($resultMethod1 | Measure-Object).Count)"records in"(($endMethod1 - $startMethod1).TotalSeconds)"seconds"

Write-Host "`nMethod 2 (with hashtable)"
$startMethod2 = Get-Date
$htabResourcePool = @{}
$htabResourcePoolParent = @{}
$htabCluster = @{}

get-view -viewtype resourcepool -property name -Filter @{"name"="Resources"} | %{$htabResourcePool.Add($_.MoRef,$_.Name)}
get-view -viewtype resourcepool -property parent -Filter @{"name"="Resources"} | %{$htabResourcePoolParent.Add($_.MoRef,$_.Parent)}
get-view -viewtype clustercomputeresource -property name | %{$htabCluster.Add($_.MoRef,$_.Name)}

$resultMethod2 = get-view -ViewType virtualmachine -Property ResourcePool, Name | ?{$htabResourcePool[$_.ResourcePool] -eq "Resources"} | Select @{n="Cluster";e={$htabCluster[$htabResourcePoolParent[$_.ResourcePool]]}}, Name
$endMethod2 = Get-Date

Write-Host -ForegroundColor Green "Found"(($resultMethod2 | Measure-Object).Count)"records in"(($endMethod2 - $startMethod2).TotalSeconds)"seconds"

2 comments

Leave a Reply

Required fields are marked *.