0

SIP Stats troubleshooting

SIP Stats Troubleshooting

 

The SIP Stats page located under the main Monitoring tab provides invaluable information about possible network events, temporary failures and configuration or interoperability issues with specific endpoints. It is best common practice to regularly visit this page to identify potential issues. The goal of this article is to help you interpret the values provided in this page as a resource of troubleshooting.

 

The SIP Stats page reloads data every 15 minutes. The very first line “Statistics data updated on” serves as a time reference for the data being displayed at a specific moment. One hour and 24 hours data can also be displayed. Data older than 24 hours can be accessed by choosing a date from the “Specific Date” field and clicking on “Display available options”.

 

Three types of statistics are provided:

 

  • Global or system wide statistics.
  • Error statistics for specific endpoints (per IP address).
  • SIP DoS statistics. This topic was covered in the September Monthly Update.

 

 

  • Number of registered subscribers and average registers per second: Under normal circumstances these values will stay steady unless the customer base has changed significantly or there is an issue. If this stat drops or increases drastically it will point to a problem (e.g. network outage, failure to reach the Registrar or Feature Server). A sudden spike beyond the normal value would indicate DoS/DDoS register attacks.

 

  • SIP buffer allocations: This value will change based on the existing call volume: number of call attempts, ASR and ACD. A healthy system will display the highest buffer allocation at peak time and smaller values during the slow hours.
  • TCP buffers in use: This will indicate how many SIP/TCP connections are being used. This value is expected to be steady at all times nless the total of TCP endpoints changes.

 

Under ideal circumstances the listed IPs will show very low error counts and if that’s the case the erros can be ignored.

 

  • Options Poll: tracks the status of SIP OPTIONS per IP if enabled. (If not enabled it will be displayed as off and only when other error stats are > 0).
    • When enabled and the VSXi is receiving OPTIONS responses status will be Active
    • When enabled and teh VSXi is not receiving responses, status will be Down.
  • Resends and Timeouts: Many INVITE resends result in post dial delay (PDD) and are typically associated with a network problem (e.g. packet loss or connectivity issue) or a particular endpoint being unreachable. A high count of timeouts will indicate a total network problem or device far-end failure. 
  • Source Verify: This type of event can potentially lead to a Denial of Service (DoS) block for that specific IP and a legitimate endpoint should not have error counts in this column. Source Verify failures are normally triggered by misconfiguration. Common examples are:
  • Wrong Service Port has been provisioned on a peering resource.
  • Peering device sending traffic to wrong Service Port.
  • Peering device sending traffic with wrong or without Tech Prefix and Peering Resource has been configured with Tech Prefix different than “default”.
  • Peering resource starts to send traffic prior to VSXi provisioning.
  • Dialog Match Error: These are triggered when requests or responses do not match existing SIP dialogs. This type of error can happen on heavily delayed systems (not the VSXi) that send requests or responses for calls that have already been torn down. It may also be a sign of an interop problem and most likely an RFC-compliance issue. 
  • Protocol Error: If an IP address always shows these errors it means that the far-end device may be misconfigured or have an interoperability issue. The type of messages that can trigger protocol errors include parse errors and other formatting problems.

A combination of errors: resends, timeouts, dialog match and protocol errors is the worst possible outcome. If your deployment’s call flow has SIP Application Servers (e.g. LNP lookup, External Routing Servers, CNAM), any delay or failure on an application server can severely degrade calls in the form of post dial delay (PDD) or call completion rate.  Lastly, if many devices are listed with many errors across different columns it typically means that a global issue has happened (e.g. Internet Service Provider (ISP), public or private network issues).

2 replies

null

Content aside

  • 4 yrs agoLast active
  • 2Replies
  • 492Views
  • 3 Following