Showing posts with label network latency. Show all posts
Showing posts with label network latency. Show all posts

Sunday, August 26, 2018

Curious Case of Network Latency

It all started with a colleague noticing ping response times higher than usual on a RHEL VM. Normal value is considered below 0.5 ms. We were seeing values upto 8-10ms.


We were comparing these values with another VM on the same subnet.

So I tried the basic troubleshooting -

  1. Reboot VM.
  2. Move VM to another Host.
  3. Move VM to same Host as the normal VM.
  4. Remove network adapter and add new adapter and reconfigure the IP.
  5. Use a new IP.
  6. Block port on the Port Group and unblock it.
  7. The network team was involved.
  8. Clear the ARP - this is an internal joke ;-) 
  9. Network team tried to ping from the Nexus 5K. Same response time to this specific VM.
  10. Traced MACs to make sure there are no duplicates, traced vNICs on UCS and vmnics on the ESXi Host.
  11. Decided to contact Cisco, VMware, RHEL, etc.
We often miss the little details and always think its a bigger issue :)

So I decided to start from the little details.

First step was to download the .vmx files for the problem VM and the VM that was responding fine.

Using Notepad++ I did a compare on the VMX files line by line. It was given that there were a lot of differences like - Virtual H/W version, Number of drives, CPU, memory, UUID, etc. but one stood right out at me - CPU Latency Sensitivity


The CPU Latency sensitivity was set to "low" on the problem VM.

Hmmm, why would anyone change the CPU sensitivity settings? And that too to "low" ? If need be it would be set to "high" but not set as low. Obviously whoever changed this (by accident or deliberately) did not know what they were doing.

CPU latency sensitivity was first introduced in vSphere 5.5 with a few caveats. 
  1. Requires to reserve 100% allocated memory
  2. vCPUs are give exclusive access to PCPUs.
  3. Network frames will not be coalesced when enabled.
Read more about this feature here

So back to our problem, how and where do we change this setting? 

There are two ways to do it - Good ol' PowerCli or the vSphere Web client.

Note: You can change the setting while the VM is powered ON but it will take effect on the next reboot.

PowerCli: 

Here is a one liner to find out if other VMs in your environment have these Advanced settings changed - 

Get-VM * | Get-AdvancedSetting -Name sched.cpu.latencysensitivity | ?{$_.Value -eq 'low' -or $_.Value -eq 'High' -or $_.Value -eq 'Medium'} | select Entity, Value | ft -AutoSize

And here is how to change it to desired value.

Get-VM vm_name | Get-AdvancedSetting -Name sched.cpu.latencysensitivity | Set-AdvancedSetting -Value Normal

Note: With PowerCli the settings are changed in the VMX file and will not be visible in the GUI until you reboot the VM. 

vSphere Web Client: 

This setting can be found under "Edit Settings > VM Options > Advanced 


Once these changes were in place and the VM rebooted, the ping response time was back to below 0.5 ms.

Happy days !!