Tekelec.com

Subscribe via Email

Your email:

SIP Sessions

Current Articles | RSS Feed RSS Feed

SIP-I and SIP-T Challenge: Early Media

This post continues the series of posts on SIP-I and SIP-T deployment challenges. You may wish to read the Introduction to SIP-I and SIP-T post for some general background on these two protocols before continuing.

This post deals with the issues surrounding the establishment of an audio path before a call is completely set up.

These problems stem from the fact that SIP and ISUP have rather different models for the way the media is set up. These differences are rooted in a philosophical difference about where call progress information is generated. In the PSTN, it is typically generated by the called party's end office. So, if a media path isn't set up before the call is completed, the call progress tones can't be sent. By contrast, in a SIP network, call progress information is usually generated by the calling party's device -- so the media path doesn't matter until the called party answers.

Because it does not require a media path to convey call progress information, SIP’s design expects that the session will be completely established before media begins to flow. There is one minor exception: as a means to avoid clipping off the initial media travelling from the called party to the calling party, SIP does specify that clients are supposed to play any media received prior to the session being established. However, this provision was designed to avoid a very specific corner case, not to carry long-lived media sessions.

To further understand this behavior, keep in mind that SIP uses an offer/answer model for establishing session parameters. One endpoint sends a proposed session description – an “offer” – with the IP address to which media of the offerer is to be sent, and a set of acceptable session parameters. The other endpoint responds with an “answer” session description; this “answer” selects final values for the various session parameters that are to be used for the media, and also includes the IP address to which media is to be sent for the answerer.

Since the media session negotiation does not typically complete until the calling party answers, media sent towards the caller before the session is completely established may or may not work properly; and, since the IP address to send media to only shows up in the answer, it is actually impossible to send media towards the caller. (Keep in mind that an RTP session is actually composed of two streams: one flowing towards the caller, and one flowing towards the called party. Once the offer is sent, the stream towards the caller can begin. However, the stream towards the called party cannot start until after the answer is received.)

By contrast, ISUP expects the ability to send media on a circuit as soon as it is seized, which happens as soon as the call attempt begins, so that it can send call progress and call error tones. To further complicate matters, many deployed IVRs take advantage of this behavior by not triggering an ACM (thus establishing the session and, typically, marking the start of charging) until a human answers the call.

In other words, SIP provides a “best effort” attempt at passing media prior to the call, while ISUP has an absolute requirement for sending media before the call is completely established.

The IETF began considering this problem well before the current round of SIP specifications were published, with significant impact on the final documents. In June of 1999, a proposed “183 Session Progress” response code was described [1] for instructing an ingress gateway to suppress ringback, and to use the in-band media instead.  Ultimately, this solution was put aside due to a number of shortcomings, including the inability to ensure that the 183 response is actually received by the calling party. (A “183 Session Progress” response code was later added to the core SIP specification, but with very different semantics than originally proposed).

As a result of the work on early media and PSTN interoperation, by the time the core set of SIP specifications was published as RFCs 3261 through 3265, it contained a limited set of tools that allowed the establishment of “early” media sessions. However, exact procedures for combining these tools were not finally published until December of 2004, in the form of RFC 3960.

At a high level, here’s how PSTN gateways can set up media sessions prior to the final establishment of a call:


This is pretty similar to the diagram for a basic call setup that we looked at in the introduction post. The key differences are that this diagram shows exactly when the media path is set up between each component in the system. In the PSTN, these audio paths are set up as soon as the network can – messages 2, 5, 12, and 21 all happen as soon as possible. Because the SIP network doesn’t treat media quite the same way as the PSTN, we need some extra signaling to set up the session. That’s where messages 7 through 9 come in. Message 7 contains a provisional session description “answer” for the session “offer” that was present in message 6. At this point, both gateways have enough information to exchange audio. (Messages 8 and 9 simply acknowledge the receipt of message 7; this is necessary to ensure that message 7 is delivered reliably.) In this scenario, because there is an audio path all the way to the called party’s end office, the ringback tone can be generated remotely (message 15) instead of being generated by the caller’s end office. However, even with these procedures in place, the ingress gateway cannot rely on the remote end generating ringback tones. It must monitor the media stream, and locally generate ringback information if there is none present in the audio. This can lead to jarring transitions where a calling party hears ringback generated by the ingress gateway, followed by an abrupt change to ringback generated by the called party’s end office.

Further complicating matters: even with the procedures defined in RFC 3960, the exact behavior defined for local versus remote generation of call progress and error tones remains a matter of local policy at the PSTN gateways.

However, the key problem with this approach is that the additional procedures used to set up an early session are not necessarily supported by native SIP terminals – which means that early media tends not to work properly when calling from a native SIP device. This is particularly troublesome in the case of IVRs that expect to play information to a user before establishing the call. Similarly, when a call is made to a SIP device, the gateway and end office must assume responsibility for generating call progress and error tones that would typically come from the remote end office.

There are some other early-media complications that arise when more than one egress gateway is involved, but we’ll save those for next time.

[1] http://tools.ietf.org/html/draft-donovan-mmusic-183-00

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.