CDR troubleshooting guide
Overview
CDRs (Call Detail Record) provide comprehensive information about the outcome (failure, success) of a call. CDRs are a great source of information when troubleshooting any of the following issues:
- Signaling issues
- Interop problem (e.g. Session Timer issue)
- Hung calls
- Call setup
- Numbering format
- Media issues
- No Audio
- One-way audio
- Quality issues (packet loss, jitter, delay, MOS)
- Codec mismatch
- Routing issues
- Route selection
- Route depth
- No route found
- LRN
- Configuration issues
- Call admission control
- Tech prefix
- Insufficient capacity or CPS (calls per second)
- Trunk direction
- Endpoint status issues
- OPTIONS polling
- Registration state
- NAT’ed?
- System issues
- License usage
- Media capacity
- System delays / performance
- Network problems
- Insufficient system resources
You can view CDRs in three different formats:
- Parsed CDR from GUI. It provides a user-friendly format to sort through the most important CDR fields to facilitate troubleshooting. This is best when searching for a specific call or a few calls.
- Tabular CDR from GUI. It provides a format useful to view several different CDRs at once, where each CDR is one row. This is best when searching for a group of calls to track a pattern. Tabular trace also allows you to pick CDR columns of interest to refine your search.
- Raw CDR. It provides the original CDR in CSV format. With a raw CDR you can combine one or more CDRs and view it in a spreadsheet application. You could also apply custom parsing via a script/program of your own to achieve specific goals. This is the most advanced mode. CDRs can be downloaded via SFTP or SCP.
How to find the CDR of a call
The CDR trace utility can be accessed from the GUI > Trace > CDR Trace.
There are two ways to search for CDRs, in terms of time reference:
- Recent CDRs. Near real-time or recent minutes.
- Older CDRs or specific time.
Once you have picked a time context there are two additional fields:
- Primary search string. This includes information you want to search for such as:
- ANI or DNIS. Enter the number like 8587542200. It is best to enter the number in the most generic format versus 18587542200 or +18587542200 unless you are only interested in a specific format. The search operates as a contains (or grep).
- A keyword. For example Timeout, Normal BYE, Origination CANCEL, Max Duration
- An IP address.
- A Trunk ID. This will be the VSXi numeric ID. The alias (not company name) can also be used in this search as it is part of the CDR spec.
- Other: This is an optional search field and will work as an AND with "Primary search string. This can be used if you are only interested in calls from an ANI and a DNIS. In that case the ANI can be entered in the Primary search string and the DNIS can be enter in Other.
Once you proceed with your search, if there is a match you will be presented with the following output.
CDR fields at a glance
This section does not cover all CDR fields but commonly used CDR fields for troubleshooting.
- Internal release cause. Provides an indicator of the success of the call. This is a summary of the possible internal release cause codes. This should not be confused with SIP/H.323 protocol disconnect codes.
- 001: Normal answered call.
- 002: No answer, tear down by origination.
- 003: No answer, tear down by termination.
- 004: No answer, tear down by VSXi.
- 4XX: Termination tear down.
- 5XX: Origination tear down.
- 6XX: System tear down.
- Protocol stack cause. This will include information as to whom released the call (origination, termination, internal) and the protocol release, 200, 503, 486, etc.
- Duration.
- Post dial delay. This will provide a hint if the problem you are having is dead air.
- Termination post dial delay.
- Ring time. This will serve as an indicator for alternate route issues (18X then 503).
- Media IP:port. Useful when needing to capture a media packet trace or to identify the media gateway being used.
- Route selection. Provides an index (e.g. 1,2,3) if there are multiple route choices available. If route selection is 1 then this call took the first option without alternate routing.
- Media packet stats. Packet count and MOS score will help in no audio, one way audio and bad quality call scenarios.
Common Release Causes
This guide provides information on release causes and provides pointers how to fix a given problem.
504 INVITE Response Timeout
Termination Stack Cause = 504, INVITE Response Timeout
An INVITE Response Timeout is always related to the device on the termination leg (i.e. VSXi is trying to initiate a session with an INVITE). An INVITE response timeout will happen if the termination didn't provide any type of response to the VSXi's request: no 100 Trying, 18X, 200 OK or any other rejection code (3XX-6XX).
The VSXi timing will depend on the termination's SIP Profile. The default is 4 seconds (one second of wait after the initial INVITE, three total resends one second apart).
By default the VSXi will return a 504 to the origination. The 504 can be mapped to another cause code, such as a 503 if needed. This will need to be mapped using the SIP-to-SIP mapping in the Cause Code Profile.
Action:
It is recommended that a packet capture (.PCAP) is captured to verify the behavior if the problem is reproducible. This problem is typically related to one of the following conditions:
Initial setup: If this is a new setup and you are getting INVITE response timeout it is likely related to a configuration issue. The following may apply:
- You are sending traffic to the wrong port.
- You are using the wrong Service Port (by mistake).
- The far-end is not yet expecting traffic from the sending Service Port.
Production setup: If this starts happening all of a sudden, INVITE Response Timeouts are related to:
- Network connectivity issue. The far end can't receive our INVITE.
- Performance issue. The far-end device can't respond to the VSXis INVITEs in a timely manner.
- TCP endpoints: A TCP connection problem.
504 SIP Resend Timeout
Origination Stack Cause = 504 , SIP Resend Timeout
A SIP Resend Timeout will happen if the origination failed to send an ACK after a session was answered with a 200 OK. This problem is almost always an interop problem on the far-end device.
The VSXi will tear down the call within 15 minutes, if no valid ACKs arrive.
Action:
It is recommended that a packet capture (.PCAP) is captured to verify the behavior.
408 Refresh Failure
- Origination Stack Cause = 408, Refresh Failure
- Termination Stack Cause = 408, Refresh Failure
A "Refresh Failure" is a result of a device not refreshing or responding to SIP Session Timers (RFC 4028). The CDR would read any of the following:
When a Refresh Failure disconnect happens, the VSXi will initiate a BYE towards both legs of a session in progress. The culprit of the session disconnected is reflected by "Origination" or "Termination". For example, if it reads Origination Stack Cause = 408, Refresh Failure then the Origination Trunk ID failed to respond to mid-call Session Timers.
Action:
There are two common behaviors behind refresh failures:
- Legitimate hung calls: Specific sessions got hung due to an underlying problem. It is desired for this session to be torn down to prevent it from staying up longer. No corrective action is needed.
- Interop problem: Devices that do not support Session Timers will result in all calls disconnecting prior to the Session Timer interval. No calls are able to exceed the configured Session Timer's interval. The corrective action is to disable Session Timers, this is done at the SIP Profile Level.
200 OK Timeout
- Internal Stack Cause = 999, 200 OK Timeout
A 200 OK Timeout may occur during call setup with media. The exact release call will read:
This release will happen if there are problems allocating media on any of the available Media Servers. Calls are disconnected fairly quickly (within 5 seconds) when this happens. This often times happens due to a connectivity or performance issue. If the call is being transcoded a transcoding exception (i.e. unsupported call flow) may trigger this release.
Action: Please contact Sansay Support with an example CDR trace for assistance.
999 EAM Routing Timeout
- Internal Stack Cause = 999, EAM Routing Timeout
EAM Routing Timeouts and EAM Busy is a performance issue that indicates that the VSXi is delayed processing calls. EAM events can be caused by a variety of factors.
- Sub-optimal system configuration. The system is running a sub-optimal configuration to meet performance goals.
- Extra large routing tables and high calls per second traffic. When relying on extra large routing tables it is recommended to pair a VSXi INX with Sansay's external route server ROME HRS; this pairing provides the best performance and routing flexibility.
- Frequent GUI/API changes to large tables. If large tables are being modified frequently the VSXi may temporarily slow down.
- Slave re-sync during peak time. When using large tables (DMT / Routes) a slave re-sync may take longer than usual and can overload the active/master system.
- Backups during peak time. When using large tables (DMT / Routes) the system may slow down and enter overload state.
Action: If your system is experiencing any EAM symptos please contact Sansay Support.
Other Release Causes
Cause Code |
Description |
415 “No valid codec” |
No valid codec could be supported between orig and term call legs. This could be related to enforced codec policy being misconfigured or devices capabilities. |
987 “Termination Capacity Exceeded” |
Term TID capacity is being reached. This could be the Aggregate Capacity or specific IP/FQDN Capacity. |
987 “Origination Capacity Exceeded“ |
Orig TID capacity is being reached. This could be the Aggregate Capacity or specific IP/FQDN Capacity. |
987 “Orig CPS Capacity Exceeded“ |
Orig TID CPS capacity is being reached. This could be the Aggregate CPS or specific IP/FQDN CPS. |
999 “Could not allocate media" |
Media rejection. This can be caused by:
|
999 “VSXi Capacity Exceeded" |
VSXi system license has been reached. Calls with this cause code will receive a 503. |
999 “Maximum Duration Exceeded" |
Configured Maximum Call Duration between involved TIDs has reached. The lowest Maximum Call Duration between the Orig and Term TIDs is enforced. |
999 “Ring No Answer Timeout" |
Call is canceled after INVITE/18X and no 200 response. |
Additional resources
If you are looking for more information on CDR troubleshooting, our SCO troubleshooting and operations class covers this topic in more detail.
Reply
Content aside
-
3
Likes
- 4 yrs agoLast active
- 792Views
-
10
Following