VSXi monitoring top KPI list and recommended thresholds
This document provides a list of top key performance indicator to monitor on Sansay VSXi systems. VSXi statistics and events can be acquired through the following protocols or methods:
- SNMP Polling and Traps
- SOAP or REST API
- Syslog
- Probe
SNMP
SNMP servers can poll and receive traps from Sansay VSXis. For polls to be processed by the VSXi the system must first be white-listed and defined.
Steps:
- Define SNMP server and community string
- Declare SNMP server as a trusted host. Note: Use Push to Stand-by button for HA systems when you are done adding trusted hosts.
API
The VSXi supports SOAP and REST API interfaces. You can use the API of your choice. For any queries to be processed, the system sending API queries must use a valid read-write or greater UI privilege and be declared as a trusted host.
Steps:
- Define web user
- Add web user to trusted host. The GUI (and API) will only provide web access (not SSH/SNMP/SFTP and Web). Note: Use Push to Stand-by button for HA systems when you are done adding trusted hosts.
Syslog
Standard syslog can be streamed from the VSXi to your system for select logs. This is currently enabled by Sansay Support and not via UI.
Probe
A probe typically consists in your existing NMS making use of a protocol. For example your NMS can send SIP OPTIONS to the VSXi expecting a response. Or your NMS can send a web query (e.g. curl) expecting a successful response. Probes are a great way to check against system uptime and latency for specific protocols.
Requirements:
- For registrations the SIP register must arrive at an existing Access Service Port and match the domain/host of an existing Access Resource.
- For OPTIONS polling the probe must be defined as a:
- Peering Resource
- Match the domain of an existing Access Resource.
- For INVITEs or other SIP methods the probe must be defined as a:
- Peering Resource
- Match the domain of an existing Access Resource.
VSXi KPI list definition
The following list does not constitute all metrics and events that can or need to be monitored on the VSXi. This list only contains essential metrics and events, how they can be acquired, recommended thresholds and severity level.
The severity definition follows the nature of the metric and the suggested turnaround to take action on an alert. For example, a sudden increase in disk storage usage is rare as it would take days or weeks to increase to unsafe levels and continue to increase to cause any issues. In contrast a system CPU load may quickly rise.
ID | Key Performance Indicator | Acquisition Protocol | Recommended Threshold | Severity |
---|---|---|---|---|
1 | Available memory | SNMP poll | 90% memory utilization or no less than 2GB. Cached memory is also considered as part of this metric. |
High |
2 | Disk storage usage | SNMP poll | Under 75% | Normal |
3 | Average CPU load | SNMP poll | 8.0 on 12 CPU systems (5-min Average Load is recommended) | High |
4 | Network interface throughput | SNMP poll | 75% network interface speed | High |
5 | Authenticated endpoints | API | 20% or greater deviation | High |
6 | HA redundancy status | API / SNMP trap / Syslog | Any change in value | High |
7 | Switch-over / restart indicator | SNMP trap / Syslog | Any change in value | Normal |
8 | SIP latency/response | Probe | No or unexpected response | High |
9 | Concurrent calls | API | Network specific | Normal |
10 | Average registers per second | API | Network specific | Normal |
11 | RAID redundancy status | API / SNMP trap / Syslog | Any change in value | Normal |
12 | Power supply redundancy status | API / SNMP trap / Syslog | Any change in value | Normal |
13 | Media server status | API | Any change in value | Normal |
14 | Web / API response | API / Probe | No or unexpected response | High |
15 | System uptime (days) | SNMP | Counter reset | Normal |
16 | Total running processes | SNMP | 20% increase/deviation | Normal |
17 | System license usage | API | 90% utilization | High |