VSXi monitoring top KPI list and recommended thresholds

updated 7 yrs ago

This document provides a list of top key performance indicator to monitor on Sansay VSXi systems. VSXi statistics and events can be acquired through the following protocols or methods:

SNMP Polling and Traps
SOAP or REST API
Syslog
Probe

SNMP

SNMP servers can poll and receive traps from Sansay VSXis. For polls to be processed by the VSXi the system must first be white-listed and defined.

Steps:

Define SNMP server and community string
Declare SNMP server as a trusted host. Note: Use Push to Stand-by button for HA systems when you are done adding trusted hosts.

API

The VSXi supports SOAP and REST API interfaces. You can use the API of your choice. For any queries to be processed, the system sending API queries must use a valid read-write or greater UI privilege and be declared as a trusted host.

Steps:

Define web user
Add web user to trusted host. The GUI (and API) will only provide web access (not SSH/SNMP/SFTP and Web). Note: Use Push to Stand-by button for HA systems when you are done adding trusted hosts.

Syslog

Standard syslog can be streamed from the VSXi to your system for select logs. This is currently enabled by Sansay Support and not via UI.

Probe

A probe typically consists in your existing NMS making use of a protocol. For example your NMS can send SIP OPTIONS to the VSXi expecting a response. Or your NMS can send a web query (e.g. curl) expecting a successful response. Probes are a great way to check against system uptime and latency for specific protocols.

Requirements:

For registrations the SIP register must arrive at an existing Access Service Port and match the domain/host of an existing Access Resource.
For OPTIONS polling the probe must be defined as a:
- Peering Resource
- Match the domain of an existing Access Resource.
For INVITEs or other SIP methods the probe must be defined as a:
- Peering Resource
- Match the domain of an existing Access Resource.

VSXi KPI list definition

The following list does not constitute all metrics and events that can or need to be monitored on the VSXi. This list only contains essential metrics and events, how they can be acquired, recommended thresholds and severity level.

The severity definition follows the nature of the metric and the suggested turnaround to take action on an alert. For example, a sudden increase in disk storage usage is rare as it would take days or weeks to increase to unsafe levels and continue to increase to cause any issues. In contrast a system CPU load may quickly rise.

ID	Key Performance Indicator	Acquisition Protocol	Recommended Threshold	Severity
1	Available memory	SNMP poll	90% memory utilization or no less than 2GB. Cached memory is also considered as part of this metric.	High
2	Disk storage usage	SNMP poll	Under 75%	Normal
3	Average CPU load	SNMP poll	8.0 on 12 CPU systems (5-min Average Load is recommended)	High
4	Network interface throughput	SNMP poll	75% network interface speed	High
5	Authenticated endpoints	API	20% or greater deviation	High
6	HA redundancy status	API / SNMP trap / Syslog	Any change in value	High
7	Switch-over / restart indicator	SNMP trap / Syslog	Any change in value	Normal
8	SIP latency/response	Probe	No or unexpected response	High
9	Concurrent calls	API	Network specific	Normal
10	Average registers per second	API	Network specific	Normal
11	RAID redundancy status	API / SNMP trap / Syslog	Any change in value	Normal
12	Power supply redundancy status	API / SNMP trap / Syslog	Any change in value	Normal
13	Media server status	API	Any change in value	Normal
14	Web / API response	API / Probe	No or unexpected response	High
15	System uptime (days)	SNMP	Counter reset	Normal
16	Total running processes	SNMP	20% increase/deviation	Normal
17	System license usage	API	90% utilization	High