Monitoring Month - GPU Metrics

Let’s take an extreme close-up look at GPU metrics ;)

Welcome back to Monitoring Month! Hopefully the look at secondary metrics helped you understand how to go deeper than just staring at CPU and RAM consumption. Now, let’s go farther down the rabbit hole…

The more use cases you tackle, the more specific you need to be in order to properly monitor the user experience.

GPU Usage %:

This indicates the average GPU Usage % consumption for the period indicated.

Sample alerting thresholds:

  • Critical: 75+% consumption for 5 consecutive minutes

  • Warning: 50+% (but less than 75%) for 5 consecutive minutes

A high value indicates that a large amount of graphics processing/rendering has been offloaded from the CPU and onto the GPU.

Note: if you see both zero GPU Usage % and high CPU consumption on this VM, then odds are the hardware acceleration is disabled for the VM (or that some other GPU enabling GPO is off).

Resolving High GPU Usage % consumption: First, consider adjusting any time interval function available to change the duration of the data displayed. This can give you a sense of when the issue originally began and if there is a recurring pattern. If the issue is consistent, then consider increasing the size of the VM to one with a larger amount of GPU allocated.

Frame Buffer Usage:

This indicates the average % consumption for the period indicated.

Sample alerting thresholds:

  • Critical: 75+% consumption for 5 consecutive minutes

  • Warning: 50+% (but less than 75%) consumption for 5 consecutive minutes

A high value indicates that a large number of IO requests are being made against the storage system. This is more common in scenarios/applications with large data sets, such as CAD applications. 

Resolving High Frame Buffer Usage: First, consider adjusting any time interval function available to change the duration of the data displayed. This can give you a sense of when the issue originally began and if there is a recurring pattern. If the issue is consistent, then consider increasing the size of the VM to one with a larger amount of GPU RAM allocated.

GPU Memory %:

This indicates the average GPU Memory % consumption for the period indicated.

Sample alerting thresholds:

  • Critical: 90+% consumption for 5 consecutive minutes

  • Warning: 75+% (but less than 90%) for 5 consecutive minutes

A high value indicates that a large amount of GPU memory available is being accessed by the GPU. This also allows the GPU to communicate with its own memory much faster.

Note: GPU Memory consumption is separate and independent from the VM's Memory consumption, meaning they don't share resources.

Note: when GPU Memory % is high but GPU Usage % is low it could be an indicator that the GPU is processing data faster than the memory available can get the data to the GPU for processing.

Resolving High GPU Memory %: First, consider adjusting any time interval function available to change the duration of the data displayed. This can give you a sense of when the issue originally began and if there is a recurring pattern. If the issue is consistent, then consider increasing the size of the VM to one with a larger amount of GPU RAM allocated.

Video Decoder %:

NVIDIA GPUs utilize hardware decoding, reducing CPU consumption and reserving it for other purposes. This is independent of graphics performance. The default time interval displayed is the last 3 hours, with data provided for every minute. 

Sample alerting thresholds:

  • Critical: 90+% consumption for 5 consecutive minutes

  • Warning: 75+% (but less than 90%) consumption for 5 consecutive minutes

A high value indicates that the GPU is offloading a substantial amount of processing actions related to video performance/quality from the CPU. If CPU consumption is high and this value is null, it is likely that hardware acceleration has been disabled for this VM.

Resolving High Video Decoder %: First, consider adjusting any time interval function available to change the duration of the data displayed. This can give you a sense of when the issue originally began and if there is a recurring pattern. If the issue is consistent, then consider increasing the size of the VM to one with a larger amount of GPU allocated.

Video Encoder %:

 NVIDIA GPUs utilize hardware encoding, reducing CPU consumption and reserving it for other purposes. This is independent of graphics performance. The default time interval displayed is the last 3 hours, with data provided for every minute. 

Sample alerting thresholds:

  • Critical: 90+% consumption for 5 consecutive minutes

  • Warning: 75+% (but less than 90%) consumption for 5 consecutive minutes

A high value indicates that the GPU is offloading a substantial amount of processing actions related to video performance/quality from the CPU. If CPU consumption is high and this value is null, it is likely that hardware acceleration has been disabled for this VM.

Resolving High Video Encoder %: First, consider adjusting any time interval function available to change the duration of the data displayed. This can give you a sense of when the issue originally began and if there is a recurring pattern. If the issue is consistent, then consider increasing the size of the VM to one with a larger amount of GPU allocated.

I am running out of time in the month of Monitoring Month! So, next week we’ll be back with another dose of metrics to monitor- Windows Processes - and then we’ll probably have a BONUS EPISODE for User Login Time metrics and vendors in the ecosystem.

There are a terrible selection of stock images for Bonus or Extra or Extravaganze, so here’s a person holding a bunch of extra money from the savings they’ve realized with all of these tips in the free content.

Previous
Previous

Monitoring Month - User Logon Times & Monitoring Ecosystem

Next
Next

Monitoring Month - Secondary Metrics