Performance GPU Alarms

GPU performance counters provide the relative GPU health and performance. Using the Performance: GPU tab, you can configure yellow and red level alarms for the GPU performance counters.

In the example alarm configuration called out below: If the GPU temperature is at or above 70° Celsius for 120 seconds a Yellow level alarm is triggered. If the GPU temperature is at or above 90° Celsius for 240 seconds a Red level alarm is triggered. If either level of alarm is triggered a Notification will be sent. The configured Action will be carried out only if a red level alarm is triggered.

NOTE: An alarm that passes through a Yellow state and achieves a Red state is considered to be in both Yellow and Red alarm status until the condition value is within the limits.

The following table defines each of the GPU performance alarms.

GPU Performance Alarm Description
Fan Level Monitors the operational health level of the GPU fan.
Fan RPM Monitors the rotating speed (measured in RPM) of the GPU fan.
Memory Used Bytes Monitors the number of bytes of memory the GPU card is using.
Memory Used Percentage Monitors the percentage of available memory the GPU card is using.
Number of Applications Monitors the number of applications currently using the GPU.
Power State Monitors the power state of the graphics processing unit (GPU). A GPU can be in one of 16 power states (but not all cards support all 16 states). Values are on a scale in which 0 indicates using the most power and 15 indicates using the least power.
Temperature Board (°C) Monitors the temperature (in degrees Celsius) of the GPU mother board.
Temperature GPU (°C) Monitors the temperature (in degrees Celsius) of the GPU.
Temperature Memory (°C) Monitors the temperature (in degrees Celsius) of the GPU memory chip.
Temperature Power Supply (°C) Monitors the temperature (in degrees Celsius) of the GPU power supply.
Thermal Level Monitors the health level of the GPU temperature. 0 = unknown 1 = normal 2 = warning 3 = critical
Usage Bus Monitors the GPU bus usage percentage (between 0 and 100%).
Usage Frame Buffer Monitors the GPU frame buffer usage percentage (between 0 and 100%). The frame buffer is an area of memory used to hold the frame of data that is continuously being sent to the screen.
Usage GPU Monitors the GPU usage percentage (between 0 and 100%).
Usage Video Monitors the GPU video usage percentage (between 0 and 100%).

Configure Performance GPU Alarms

  1. Use the Performance: Alarms check boxe to apply the same configuration to the individual alarms.
  2. To change a default Limit or Time value, click the value, then enter the updated value.
  3. Depending on the alarm, selecting a check box to the left of an alarm will generate an alarm when the value is either above or below the configured Limit for the specified time period. For some alarms, such as Temperature GPU, the alarm is triggered if the value is at or above the Limit. For others, such as Fan RPM, the alarm is triggered if the value is at or below the Limit. If this box is unchecked, then the SysTrack Agent will ignore the limit thresholds – no alarm will be generated (although the Agent will still continue to track data for this setting).
  4. Select a Notification profile to use for the alarm notification settings that will be followed when the alarm is triggered.
  5. Select a Time Window for the alarm.
  6. Select an Action profile to use when the alarm is triggered.