Posted by Ben Campbell on Tue, Jul 20, 2010 @ 11:26 AM
In my role as co-chair of the IETF SIMPLE working group, I've heard a lot of questions about the scalability of SIMPLE based presence services. This is the first in a series of posts on various subjects related to SIMPLE scalability.
One of the hard problems in scaling presence services in general is created by the very idea of contact lists. When the presence state of a given resource changes, a presence service must notify every subscriber to that resource. You can think about this in terms of relationships, and simplistically estimate the total number of relationships in the system of as the number of users times the average contact list size. As both numbers grow, you get a combinatorial explosion.
Of course, in the real world, you get a lot of variation in the number of subscribers between resources. Scalability is further impacted by "celebrity" resources that have huge numbers of subscribers.
In "base" SIMPLE, a client forms a separate SIP-events subscription for the presence state of each resource in the user's contact list. So if your contact list has 50 entries, that's 50 separate SIP subscription dialogs to maintain, refresh soft-state for, etc. That's not really a problem for most modern client devices, but it can put quite a load on presence servers that need to handle that many subscriptions for each user. And if a lot of those clients share a limited bandwidth access network (such as with a mobile data network), they can put a substantial load on that network.
Resource List Servers (RLS) can help in certain circumstances. [Full Disclosure: I was a co-author for that specification] With an RLS, a SIMPLE client creates a single subscription to an entire contact list, rather than a separate subscription to each contact on the list. The RLS will then send presence notifications that contain presence state for multiple contacts at a time, along with metadata describing the current entries in the contact list. That is, an RLS subscription provides presence state for list entries, and also provides the state of the list itself.
The RLS extension seems like an obvious optimization for SIMPLE. But in reality, it only helps for some very specific situations, or in combinations with some other optimizations.
First, the RLS still has to get presence state for each resource on a list from somewhere. RFC 4662 describes the idea of a "Back-End" subscription, where the RLS subscribes to a resource on behalf of the end-user. This can be mitigated by co-locating the RLS with a presence server (PS). But practically, a contact list is likely to contain resources at many different PSs, in many different domains. So the use of RLSs doesn't have much of an impact on the total number of subscriptions that the PSs have to manage.
At this point, many readers will think "But doesn't a PS only need to handle one subscription to a particular resource from each RLS, instead of one from each client?" Unfortunately, this really doesn't work. Imagine Alice and Bob both subscribe to Carol's presence. Can we assume Carol wants to allow both Alice and Bob to subscribe? And even if she does, can we assume she wants to give the exact same information to both of them? Unless the RLS has knowledge of Carol's presence rules, it must maintain separate back-end subscriptions for Alice and for Bob. In practice, it's very difficult to create "shared" back-end subscriptions without creating privacy problems.
So, without some mechanism (standardized or proprietary) for an RLS to learn the presence rules for each resource it subscribes to, and a way for each PS to know it can trust the RLS to properly enforce the rules, the RLS extension really not help much with presence server subscription load. Nor does it reduce the required bandwidth between the RLS and PS.
On the other hand, it can reduce the bandwidth required over the client access network. This is extremely useful if the access network is bandwidth constrained--for example in the aforementioned mobile data network. It is less useful if access network bandwidth is plentiful, such as with broadband networks, and possibly even 4G mobile data networks (although in those cases it may still help with things like client battery life.) In fact, in a broadband network, an RLS may create new scalability problems, by creating a new bottleneck in what was previously a highly distributed subscription model.
Even with scarce bandwidth, the utility of an RLS goes down if the network traffic created by bona-fide presence changes is very large compared to the traffic created by the overhead of subscription maintenance. This can be mitigated somewhat by applying some rate-limiting optimizations. I will discuss SIMPLE presence rate-limiting in a future post.
So in summary, the RLS extension is an effective scale optimization for some very specific real-world circumstances, such as in mobile presence services. But it's not for everyone, and it takes some additional optimizations to get the most out of it.
Posted by Adam Roach on Tue, Jul 13, 2010 @ 12:01 PM
SIP trunking, broadly defined, is a service in which an Internet Telphony Service Provider (ITSP) provides service to a customer-operated Private Branch Exchange (PBX). There has been considerable work on defining parameters around commercial SIP Trunk offerings over the past few years, including the SIPconnect effort within the SIP Forum and the Business Trunking specification developed by ETSI.
One of the problems that has remained most pervasive, however, is the means by which an ITSP knows where to send messages destined for a particular customer. Early offerings frequently required manual provisioning of customer IP addresses – calls addressed to one of a customer’s phone numbers would be routed to the address that they gave the ITSP when the service was set up. Unfortunately, this approach suffers from a large number of shortcomings. For example, the additional provisioning step of gathering IP address information from customers leads to less efficient provisioning and higher operational costs. Also, this kind of set requires customers to contact their ITSP if they ever need to change the IP address of their PBX. And, since such provisioning changes often take hours or days, this approach can leave customers without phone service for very long periods of time.
The first serious attempts to solve this problem came from the IMS network, and were modeled on the way IMS handles single users with multiple AORs. Basically, the PBX would register a single identity – a lead number, for example – and the ITSP would presume that calls for all the identities associated with that PBX should be routed to the same destination. It was a very simple solution to the problem, and it worked passably for the kinds of environments that IMS can assume (i.e., tightly controlled walled garden networks, where non-standard behavior can be provisioned into SIP servers by bilateral agreement between the ITSP and the PBX owner).
This naïve solution to the problem, however, suffered from a number of drawbacks. Significant details about processing of inbound INVITE requests were left unspecified, leading to very real deployment issues in the field. Further, this very real change to the semantics of REGISTER – that is, its nature of registering many disparate AORs instead of a single AOR – was not signaled between the PBX and the ITSP. Outside of tightly-controlled walled garden networks, this lead to situations in which the ITSP or the PBX thought the IMS mechanism was in use while the other end did not. The resulting call failures – which often would involve signaling loops – were difficult to diagnose, and even more difficult to solve. The solution also suffered from being designed without significant input from SIP protocol experts, making mistakes such as defining a wildcarding syntax that is fundamentally incompatible with SIP syntax in general.
However, the key problems were far more structural than these, which could be solved by minor tweaks to the specification. In particular, while these attempts did manage to make basic calls work under the right circumstances, they were designed without regard for key registration-based mechanisms developed within the IETF. Interaction with the registration event package was added as an afterthought, and in a way that assumed everyone in the network would be aware of the new REGISTER semantics. No provisions were made for allowing the use of temporary GRUUs, which are a critical part of the ability to make and receive calls in an anonymous fashion.
To address this situation, the IETF took on work near the end of last year to specify a mechanism for registering multiple AORs with a single SIP message. This work was spurred predominantly by the SIP Forum’s SIPconnect work. Within the SIPconnect effort, it became apparent that the existing solutions weren’t sufficient for the more general architectures they wanted to enable. The resulting working group – called MARTINI – has been working at a feverish pitch over the past six months to produce a mechanism that solves the registration problem, while addressing the shortcomings of the previous mechanisms.
The proposed solution [1] has largely stabilized, and is now entering a final comment period within the MARTINI working group before being passed off to the IETF leadership for publication. At a high level, this solution sidesteps a large number of the problems that existed in prior solutions by closely simulating what would happen if the PBX sent a separate REGISTER message for each of its phone numbers. In other words, it uses REGISTER to update a registration database, in contrast to earlier solutions that were effectively updating a broader domain routing database.
The solution also includes significant provisions to ensure that previously-defined registration-related mechanisms in SIP remain viable for PBXes that choose to use it.
With any luck, then, we should finally have a general-purpose solution to the problem of how to route requests over a SIP trunk to a PBX finished and stabilized within the year. Combined with the other work being done in the SIP Forum SIPconnect group, this should lead to a well-defined, unified specification that allows ITSPs to quickly and confidently deploy SIP trunking services. And that can only be a good thing for SIP.
__
[1] Full Disclosure: I am the editor of the solution developed by the working group, and have been deeply involved in its design.
Posted by Dorgham Sisalem on Thu, Jul 08, 2010 @ 11:47 AM
The first gateways that translated between VoIP and PSTN were rather simple in design. They consisted of a single box that translated SS7 into VoIP (H.323 or SIP) and TDM into RTP. In the late nineties a number of protocols were specified, starting with SGCP followed by IPDC, MGCP, and ending with MEGACO. These protocols changed the design of gateways into a distributed architecture that consists of a signaling gateway, media gateway and media gateway controller.
- The signaling gateway is responsible for translating between different protocols.
- The media gateway is responsible for all media-related tasks, such as translating between TDM and RTP or transcoding media.
- The media gateway controller controls the media gateways and uses the MEGACO protocol to provide the instructions needed for fulfilling their tasks.
This architecture has the following advantages:
- Centralized signaling and distributed media processing: For various reasons, such as billing, management and provisioning, it is beneficial to have all the signaling information processed at a centralized location. However, the media gateways should be placed at the points where they are most needed. So, for example, in the case of a peering scenario in which on operator is peering to multiple operators, it is desirable to have all the signaling information being processed at a central location that might reside in a large data center of the operator. The media data should however be placed as close to the interconnection point as possible.
- Scaling: Processing of media and signaling requires different kinds of resources. Hence, with a distributed architecture, it is possible to add new media gateways to handle more data flows or more complex scenarios independently of the media gateway controller and signaling gateways.
- Evolution: With a distributed architecture it would be possible to enhance the scope of the controlling components without having to touch the media gateways. So, if a signaling unit supports translation only between H.323 and SS7 at the beginning, it can be upgraded to support SIP in a second stage without any effects on the media gateways.
- Vendor independence: As the media gateways and the media gateway controllers are communicating with each other using a standardized protocol it should be possible to buy them from different vendors. This would enable some vendors to concentrate from example on optimizing media handling whereas other might dedicate more work for signaling.
While this distributed architecture surely has its advantages, one should not mix it with MEGACO. MEGACO is one protocol that can enable this architecture but is not the only way of getting there.
Using SIP, one can achieve the same distribution and benefits without having to introduce another complex protocol into the network.

This figure presents the scenario of a distributed peering scenario using SIP only. A central gateway controller acts as the contact point for the exchanging calls with the peering partners. However, the servers responsible for the media are located near the peering points because it would be disadvantageous to route all the media traffic to the central point and redistribute it from there to the final destinations.
When a call comes in at the controller, the call is (step 1) processed and the INVITE request is forwarded to the NNI (Network-Network Interface) closest to the peering point (step 2) – in a MEGACO architecture the controller would be sending MEGACO commands. The NNI is a component that media processing capabilities and a SIP stack (in a MEGACO architecture this would be a MEGACO stack). Based on the content of the SIP INVITE request the media processing capabilities of the NNI are instructed to do the required tasks.
So basically the usage of MEGACO is replaced by SIP. This provides for a simpler central controller as this server only needs to support SIP. The same advantages of scalability, distribution and vendor independency are still maintained.
In a future blog I will try to demonstrate how the various packages and commands of MEGACO are replaced in a SIP architecture.
Posted by Robert Sparks on Tue, Jun 29, 2010 @ 01:23 PM
SIP proxies play a key role in realizing SIP's rendezvous service - helping entities that want to communicate find each other by forwarding SIP requests to the places they can best be served.
A proxy can try the request at more than one place, either one at a time, a few at a time, or all at once. This behavior is called forking.
A request may be forked by more than one proxy on the way to its destination.
Each of the places receiving the request is supposed to generate a response. The proxy is responsible for choosing the "best" response to forward back to the requester. Except for one special case, the proxy will only return one final response to the request it receives. The final responses from the other branches of the fork are dropped at the proxy. RFC 3261 section 16.7 discusses how the proxy chooses the best response.

SIP proxies were originally designed this way to allow the endpoints to have the same behavior whether there were proxies in the path of a request or not. (Other design decisions forced different behavior at the endpoints anyhow). Dropping the other requests was a tradeoff, and it introduced a problem known in the specifications as HERFP (the Heterogeneous Error Return Forking Problem). When there is a mix of error responses from the various fork branches, only one is returned to the requester, but that requester might have been able to do something useful with the other responses. In the example above, the rightmost phone's response could have included information about when to try the request again. The original requester would have learned that a future call there had a reasonable probability of being accepted.
Though the condition is known as HERFP, it applies to non-error responses for non-INVITE requests like REGISTER or SUBSCRIBE. For any non-INVITE request, a proxy will only return one final response, whether it's a success or error response. This is why the SIP Events extension to SIP requires that elements accepting a subscription MUST send an immediate NOTIFY.



There is one special case where a proxy might return more than one final response to a request. When a proxy sees a 200 OK to an INVITE it is required to forward that to the requestor, even if it has already forwarded a 200 OK from another branch. This exception was added to the protocol to allow the calling user to choose which person to talk to if more than one endpoint answers a call. Other protocol rules try to make this condition unlikely. When the proxy sees the first 200 OK to the INVITE, it will send a CANCEL request to all the other branches. A second 200 OK could only be received from one of those branches if it crossed that CANCEL on the wire (or in the processing at the endpoint). Unfortunately, it's not hard to encounter that race-condition in practice. It's up to the endpoint to decide what to do if it finds itself in multiple calls after sending an INVITE. Many deployed endpoints today send an immediate BYE to peers beyond the first accepting their call.
There have been a few proposals in the past for changing or extending the protocol to avoid HERFP, allowing the endpoint to learn about all the final responses that currently get absorbed at proxies. None of them achieved consensus. So far, it's an open problem.
Posted by Jiri Kuthan on Thu, Jun 17, 2010 @ 08:17 AM
Last week, we had a distinguished professor, Andrew Odlyzko, as an invited guest to our “Brilliance in Innovation” lecture series. The critique Andrew articulated about state-of-the art security systems was that increasing security level to perfection is expensive and hard-to-achieve. At the same time, establishing secure communication by multiple, though simple, channels can achieve reasonable robustness without the curse of perfection. An example of this is confirming online banking transactions by sending a text message from your cell phone.
What I find interesting about this example is the role of a cell phone – an extremely useful and at the same time underutilized instrument to establish one’s identity.
Clearly identity is a key notion to our society. We know many forms of it, and life without them would be less convenient. Think of a credit card number for payments, a car license plate to identify traffic offenders, an online auction history to the establish reputation of buyers and sellers. None of these identity types are as waterproof as DNA, but still how comfortable would you feel communicating, socializing, trading and living without them?
Nowadays thinking and sharing loudly over the Internet, as manifested by Facebook, seems to prevail in users’ desires. I believe though that within a few years the desires for privacy refuge will strongly emerge. In other words users, not just businesses but consumers too will want to be in control of what they share with whom. The technical prerequisite is establishing who is who, and the question to make ourselves busy with is HOW? The answer must be an instrument that is workable on a worldwide basis and must have a price-tag – other identity forms would be just indeed worthless. Could cell phones do this for us?
My quick answer is yes. I’m betting on cell phones to be THE identity vehicle within next five years. Cell phone companies have established immense coverage of the global population, have close relationships to their users, and are therefore in a fairly good position to assert their identity. Mobile Internet’s share is increasing rapidly and cell phone-based identity can be used on a global basis for any applications. The applications may not include just online services, such as online banking, but also confidentiality by encryption or payment.
This all still seems certain to me – stronger identity, confidentiality, payments are all tied to cell phones. What remains to be seen though is who is going to prevail in putting this “new non-anonymous world” together. Apparently, mobile providers would be obvious example. However, most widely deployed applications, like SMS-verified online banking, didn’t want to wait and have carefully avoided dependence on mobile operators and phones. They work simply and efficiently in-band.
There are however scenarios with higher demands for security: think of confidentiality and the inconvenience it would take to run encryption handshake over text messages. Applications on smartphone could ask third parties for identity assertion. Conversely, the applications could ask service providers for the same assertion.
So the final question to myself is: will it be the Verizons or VeriSigns of this world who will add the notion of identity to the sharing culture?
Posted by Ben Campbell on Tue, Jun 08, 2010 @ 09:43 PM
MSRP offers a highly configurable reporting model. The model offers two mechanisms for an endpoint to learn the status of the messages it sends.
The first, and simplest, mechanism is the transaction response to a SEND request. Here's an example from RFC 4975:
MSRP a786hjs2 SEND
To-Path: msrp://biloxi.example.com:12763/kjhd37s2s20w2a;tcp
From-Path: msrp://atlanta.example.com:7654/jshA7weztas;tcp
Message-ID: 87652491
Byte-Range: 1-25/25
Content-Type: text/plain
Hey Bob, are you there?
-------a786hjs2$
MSRP a786hjs2 200 OK
To-Path: msrp://atlanta.example.com:7654/jshA7weztas;tcp
From-Path: msrp://biloxi.example.com:12763/kjhd37s2s20w2a;tcp
-------a786hjs2$
A SEND transaction response contains a status code similar to the status in a SIP response. Keep in mind, though, the semantics of each status code are not quite identical to those for SIP. In this case, the "200" status indicates successful delivery. Since the To-Path and From-Path header fields each contain a single URI, we know this transaction was sent peer-to-peer. That is, there are no MSRP relays involved in this transaction. Also, notice that the To-Path and From-Path values are reversed in the response. Unlike SIP, which copies the From and To values from the request into the response without switching them, the To-Path and From-Path header fields in an MSRP response indicate the actual routing of the response.
Responses to SEND requests are sent hop-by-hop, rather than end-to-end. This is due, among other things, to the way that MSRP relays can re-chunk messages, as I discussed in a
previous post. Responses for non-SEND methods are sent end-to-end.
If the sender doesn't get the response within 30 seconds, it assumes the request failed.
The hop-by-hop nature of SEND responses can be insufficient if MSRP relays are in the session path. How can the sender learn about the success or failure of a message once a relay has forwarded it downstream? For example, what if the relay gets a failure response from its next hop? For this scenario, the relay can send an MSRP REPORT request. Here's an example:
MSRP dkei38sd REPORT
To-Path: msrp://alicepc.example.com:7777/iau39soe2843z;tcp
From-Path: msrp://bob.example.com:8888/9di4eae923wzd;tcp
Message-ID: 12339sdqwer
Byte-Range: 1-106/106
Status: 000 200 OK
-------dkei38sd$
A report request carries very similar information to the transaction response for a SEND request. Notice the "Status" header field, which carries the same sort of status code that you might see in a transaction response. The "000" prefix indicates the status code is one of the ones defined in the base specification. This is an extension hook, where other specifications could create status code "name spaces" with different prefixes. Remember from my
last post that a REPORT request reports status for a range of bytes, which may or may not line up with the range in any particular SEND request that the sending endpoint actually sent.
A REPORT request is a bona-fide MSRP request. However, MSRP devices do not send transaction responses for REPORT requests. They should never send REPORT requests in response to other REPORT requests. REPORT requests that include a failure code, aka "Failure Reports", are sent from the reporting element all the way back along the session path to the sender of the original request. On the other hand, "Success Reports" are always sent end-to-end, since only the endpoint can no for sure that a message was delivered successfully.
This is all complicated by the fact that MSRP allows a great deal of configuration in the reporting model. When the sender creates a SEND request, it can independently select whether it wants to receive success reports and failure reports, on a per message basis. It does this by including the optional Success-Report and Failure-Report header fields.
Success-Report can take a value of "yes" or "no." Failure-Report can take those same values, plus the value of "partial." The defaults are "no" for Success-Report and "yes" for Failure-Report
If the sender wants delivery confirmation, it sets Success-Report to "yes". The default is "no", so if the header is not inserted, success reports won't be sent.
Along the same line, if the sender wants to suppress failure reports, it can set Failure-Report to "no." But there's a catch here--if Failure-Report is "no," that also suppresses transaction responses to the request. That means there's really no failure detection at all, other than what might be detected by TCP. That may be counter-intuitive, but it makes sense for some applications. Examples include system messages sent by an administrator or broadcast messages sent by an emergency services agency. It also may make sense for high volume applications where it would be too heavy weight to send all those responses, and the TCP layer provides sufficient reliability. Keep in mind that Failure-Report and Success-Report can be set independently, so you can suppress failure reports and transaction responses, but still request success reports.
Then there's the really strange sounding mode, where Failure-Report is set to "partial." This mode suppresses transaction responses just like if it was set to "no." But it still allows failure reports. This lets MSRP elements opportunistically report any errors they learn about through other means. For example, transport layer errors by downstream devices.
I think this will be the last of my MSRP related posts for a while, unless readers have specific questions. Please feel free to ask questions in the comments section. Otherwise, I will move onto some new and different topic in my next post.
Posted by Adam Roach on Tue, Jun 01, 2010 @ 05:30 PM
One of the recurring topics in the discussion of SIP security is how you give users the information they need to make informed decisions. In most of these conversations, a parallel is drawn between web browser security and SIP security – usually, in terms of “why can’t SIP terminals have a simple lock icon that tells the user the call is secure?” And all major web browsers do have a simple visual indicator, like these two from Internet Explorer and Firefox:

Unfortunately, the issue with SIP is significantly more difficult than that. With web browsers, you really need to ensure only two things: that the website you’re connecting to is the web site you think you’re connecting to (authentication), that no one other than you and the website can see the information you’re sending and receiving (confidentiality). For the web, this is easy to do because TLS (used by https) provides both of these properties.
With SIP, you have at least five different major problems to solve – and possibly more, depending on how you account for them: Caller-ID, Called Party Identity, Media Privacy, Media Authentication, and Signaling Confidentiality.
Caller ID and Called Party Identity
First, when a call arrives, the user is going to want to know who is calling, similar to Caller-ID on today’s PSTN. Jiri did a series of posts (1,
2,
3) detailing the need for identity in the SIP network. (While this is a good treatment of the need for identity, I think its conclusion – that we should use the same spam-prevention mechanisms as email – is a bit naïve; as Ben later points out, 94% of all email is spam, and I think we need to do better than that.)
While some techniques can be employed to “spoof” caller ID information on the PSTN, it’s difficult to do, so people generally can and do trust what their phone says when it rings. On the other hand, since SIP signaling flows all the way out to the edge of the network, this kind of identity is much easier to fake in a SIP network. Some deployment architectures have developed specialized “transitive trust” models that get you pretty close to what the PSTN provides today, but they don’t work across the general Internet, or when you transition from one architecture to another.
A more bulletproof means of conveying identity can be performed with RFC 4474, which uses cryptography to let a proxy on the call path make an assertion about the calling party’s identity. Unfortunately, RFC 4474 does suffer from some deployment difficulties, such as perceived deficiencies in key distribution, the difficulty in asserting ownership of phone numbers, and bad interactions with SBCs. And while there are good answers to each of those issues, they still have slowed down acceptance of RFC 4474 as a solution.
A related issue is validation that the person you’re trying to reach is the person you’ve actually reached. For example, if Alice is trying to reach Bob but really reaches Charlie, she needs to know this to make an informed decision. This is even more important when Alice is trying to reach, for example, her bank. There are fairly benign reasons that the called party might not be who the caller was trying to reach – a call-forwarding service, for example – but it also may indicate something more nefarious. To fill this niche, RFC 4916
defines a mechanism for conveying called party identity back to a calling party. It shares RFC 4474’s strengths (cryptographic assertions, leveraging the web’s public key infrastructure), but suffers from the same drawbacks as well.
One interesting twist to the behavior of RFCs 4474 and 4916 is that they only protect the caller and called parties’ addresses, not their names. To protect things like caller names, it becomes necessary to use a mechanism like cryptographic certificates with S/MIME.
Media Privacy and Authentication
Another user expectation of “secure calls” is a guarantee that third parties cannot intercept their call. This is especially important when users make calls on a shared network, such as a public WiFi network, a hotel network, or certain types of cable networks. Unless the media itself is encrypted, anyone on the same network can use any one of a variety of easy-to-use call interception tools, including some very sophisticated, free ones, and record any call or calls they want to.
The other issue with media is ensuring that the media you receive is coming from the person you think it is. The ability to insert new media into a call can be highly damaging for certain types of calls.
Unfortunately, this area has historically suffered from too many solutions, as opposed to not enough. Luckily, the IETF finally winnowed the solution space down to a single approach for SIP media encryption: RFC 5763. There is also a competing solution in zRTP. This approach has some interesting properties that Jiri discussed in a previous posting – but it also suffers some non-technical drawbacks (see my response at the end of that article) that are likely to limit its deployment outside of the opensource and hobbyist communities. And, while zRTP provides encryption, it requires an onerous manual step to ensure that you’re talking to the person you think you’re talking to (and, without this protection, your call can be listened to by a sophisticated attacker in the middle of the network).
Hopefully, with the recent publication of RFC 5763, we’ll start seeing more vendor support for media privacy and authentication.
Signaling Confidentiality
A final aspect of SIP security that needs to be addressed is confidentiality of the signaling information itself. For voice calls, access to the signaling allows you to figure out who called whom and when. And, while the privacy implications of exposing that kind of information are evident enough, things get much worse once you start mixing in features like instant messaging and presence: eavesdroppers on this information can learn highly sensitive information, such as the contents of instant message conversations.
Support of TLS to protect information as it passes between network entities (say, from a phone to its proxy) is required by the baseline SIP protocol, and has fairly good implementation (on the average, approximately 50% of the implementations at the SIPit interop event
have had TLS support over the past few years). That’s a really good way to ensure that arbitrary third parties can’t eavesdrop on the information being sent.
But TLS doesn’t protect information from being intercepted by servers on the call path.
And while I might be happy to get my SIP service from bobs-discount-voip.com, I may be a bit more reticent to trust them with things I send and receive via instant messages – things like my banking information. And that brings us back to the use of S/MIME certificates, which can be used to hide this kind of information from proxies on the path (while still providing them enough information to route messages correctly).
Summary
So, back to the original question: if you wanted to have a simple, visual indicator to indicate that a call is secure… what would it mean? Is it a promise that the phone number on the caller ID is correct? How about the name? Does it mean that the media is encrypted? And, if it is, can you be sure it’s coming from where you think it’s coming from? Is the signaling protected? And, if so, is it protected from everyone, or can proxies along the call path read it? There are so many degrees of freedom here that there’s no good way to render them all to the user in a sensible fashion. And an all-or-nothing indicator (like a single lock icon) is completely nonsensical – as you’ve seen, SIP security is just about as far from “all-or-nothing” as you can get.
At this point, sadly, it’s mostly a moot point anyway – just about all SIP service providers employ exactly none of these techniques. But as user expectations around identity and privacy start colliding with the reality of service providers’ carelessness, we’re going to run into a few challenges making sure that users can be given the information they need to make informed decisions.
Posted by Dorgham Sisalem on Tue, May 25, 2010 @ 04:11 PM
In continuation of my last posting - The Problem with SIP Congestion Control - I would like to briefly discuss some proposed solutions for solving this issue.
As mentioned in the last posting, the only mechanism provided in the SIP specifications is for the server to reply back with a 503 response. However, as the 503 response is kind of a binary mechanism, the server will either receive traffic or not. This can easily lead to oscillations and might cause the entire SIP infrastructure to become unstable under overload situations.
In general, one can distinguish between standalone and feedback based overload mechanisms.
In the standalone approach, an overload control mechanism is implemented at the SIP server. This server will then monitor its own resources, e.g., memory, CPU and bandwidth. Based on the monitored resources the server will recognize when it starts to become overloaded and will have to deal with the incoming traffic by either rejecting new calls or even dropping them -whereas rejecting is preferred as dropping a request will cause the caller to retransmit the request and hence the server will end up having to deal with the same call a number of times.
When dropping/rejecting requests the server will have to ensure that running calls are not interrupted -i.e., it would be a bad idea to accept a new call, but reject a BYE request as losing the BYE request might cause irregularities in the charging process. An example of a standalone approach can be found in the paper - Protecting VoIP Services Against DoS Using Overload Control.
The other approach for overload control is more of a cooperative process. In this scenario the overloaded server regulates the amount of traffic it is receiving from its neighbors by informing them about its current load. The load information can be sent to the neighboring servers by either adding a header in the SIP response or by using the SUBSCRIBE/NOTIFY mechanisms of SIP. The neighboring servers will then adapt the amount of traffic that they are sending to the overloaded server. In case they have to reduce the amount of traffic they want to send to the overloaded server then they will also inform their neighbors to send less traffic to them. This way, the congestion is pushed to the border of the network and calls are not forwarded to the core components only to be dropped there. A survey of different control mechanisms can be found in the article - Session Initiation Protocol (SIP) Server Overload Control: Design and Evaluation.
Both approaches have their pros and cons. The standalone approach can be deployed without having to rely on other SIP components in the network to also support overload handling. It also does not require any standardization with regard of how to exchange status information. This makes this approach the ideal one for now. However, this mechanism does not cause the overall load of the SIP network to go down.
The feedback based approach can adapt the number of calls in the network to the actual available resources and would push the overload to the borders of the network. In this way, excess calls will be prevented from even reaching the overloaded servers and access points can consider using non-overloaded paths for establishing the calls. On the down-side, a server that ignores the feedback information would still cause overload and packet drops. Hence, to be on the safe side, a SIP server will have to implement a combination of both approaches.
Posted by Robert Sparks on Wed, May 19, 2010 @ 03:03 AM
In short, early media is any media related to an attempt to initiate a session that arrives before the session is fully established.
SIP negotiates the setup of media streams using the offer/answer technique defined in RFC 3264. The simplest call establishment might look something like the flow shown in Figure 1.
Figure 1
The offer contains a description (using the
Session Description Protocol) of how and where Alice is willing to receive media. This description specifies the address and port Alice will receive packets, the protocol used to send those packets (typically
RTP using the AVP profile), and information about how the media will be encoded in those packets.
Once Bob receives Alice's offer, he can start sending media based on that description right away. In many deployments, that's a normal behavior for Bob's endpoint, especially if Bob doesn't answer immediately. His endpoint may send a ringing sound or an announcement as shown in Figure 2.
Figure 2
This frequently happens when Alice's SIP INVITE reaches a PSTN gateway. When the gateway receives the INVITE it tries to set up a call on the PSTN side and will send any media it receives before that call completes (such as ringback) back to Alice. See
Adam's SIP-I and SIP-T Challenges blog entry for more detail.
The core SIP specification did not include any mechanism to ensure that provisional responses (like the 183 Session Progress in the above example) are reliably delivered. One could be lost in transition and neither end would know.
RFC3262 defines an extension to SIP to add that reliability mechanism in the form of a Provisional Response ACKnowledgement (PRACK) request. When using this extension, the responder will retransmit the provisional response, following a proscribed retransmission time algorithm, until it receives a corresponding PRACK, as shown in Figure 3. This extension is especially important if the answer carried in the provisional response contains information that Alice would need to be able to make sense of what was in the media packets. For instance, in some uses of
SRTP, that answer will contain data that Alice's endpoint must receive before it can decrypt the media streams.
Figure 3
SIP proxies can "fork" an INVITE request as they forward it, delivering the request to multiple locations. In Figure 4, Alice's call to Bob is delivered in parallel to Bob's cell phone, his desk phone, and his home phone. All three phones ring simultaneously. The protocol has mechanisms that will stop the ringing at Bob's cell phone and home phone as soon as he answers his desk phone (making it unlikely that Alice will have to deal with more than one of Bob's phones being answered). However, if more than one of Bob's devices sends early media, Alice's phone will have to do something reasonable while receiving multiple streams. Many existing endpoints play only one stream to Alice (chosen arbitrarily) and quietly discard the information from the other streams. Adam explored some of the consequences of this for PSTN interworking in an earlier article.
Figure 4
It's worth remembering that SIP does not put a limit on how far apart in time the INVITE and 200 OK occur. It's possible to place an INVITE and wait many minutes - even hours, before the call is answered with a 200 OK. During that entire time, the calling endpoint may be receiving (and sending) early media. This happens for some calls that transition into the PSTN that terminate on IVRs for example. Some of those systems are configured to leave the call in the ringing state, playing announcements and collecting keypresses, potentially until the end of the call. The call might only be "answered" (resulting in a 200 OK from the SIP side of the gateway) if the interaction with the IVR caused a connection with a human agent.
Posted by Vince Lesch on Wed, May 12, 2010 @ 08:48 PM
Over the last several weeks Ben Campbell has made several posts on MSRP - (Message Session Relay Protocol) and the work of the IETF SIMPLE working group that he currently chairs. The work of this group is becoming increasingly important to wireless carriers as they consider how to evolve SMS/Text messaging.
The GSMA has been actively promoting RCS - Rich Communications Suite - as an evolution path from SMS to an all IP messaging environment that supports additional features such as IM, chat, presence and a network-based address book. RCS makes use of protocols such as SIP and MSRP.
The GSMA-led RCS work is interesting from many perspectives, but perhaps the most important aspect is their focus on inter-carrier interoperability. If one looks at the history of SMS/text messaging, it didn't become popular with end users until it was possible to send messages from one carrier to another - and this will probably be true of any evolution of SMS - interoperability between carriers will be essential for wide-scale adoption.
GSMA currently has 3 defined releases of RCS.
- RCS Release 1 Features
- Enhanced Address Book (EAB)
- Social presence
- Service capability information
- Hyper-availability
- Blacklist
- Enriched content support
- Backup/synchronization with network address book
- Content sharing while on CS voice call (note, this requires the two radio hybrid handset)
- File transfer while on voice call or in message conversation)
- Enhanced Messaging
- Conversational messaging on the handset
- Unified composer for SMS/MMS
- Threaded view of SMS/MMS messages
- Chat service
- 1 to 1
- 1 to many
- Based on session mode messaging as defined by OMA IM SIMPLE 1.0
- RCS Release 2 Features
- Broadband Access (BA) to RCS Features
- Voice call initiation and reception
- Content sharing during voice calls
- File transfer during voice or message sessions
- Session-based chat initiation and reception
- Ability to send (but not receive) SMS
- Multi-device environment
- One primary and up to two secondary devices
- Primary client must be capable of cellular access
- EAB interaction from either device
- Network Address Book (NAB)
- Operator administered and maintained
- Include all devices owned by user
- Control of synchronization with EAB
- Provisioning and configuration of RCS clients/devices
- Customer transparent configuration of RCS devices
- RCS Release 3 Features
- Broadband access device as a primary device
- Content sharing enhancements
- Content sharing without a voice call
- Deferred content sharing to legacy terminals
- Presence enhancements
- Geo-location information in text and/or map Personalized invitations (using names from EAB rather than MSISDN)
- URL labels
- Ability to display service capability information for all contacts in EAB, even if social presence relationship is not established
- Messaging enhancements
- Ability for broadband devices to both send and receive SMS and MMS
- List of invited participants/receivers for group communications
- Network Value Added Services (NVAS)
- Enrichment of content and chat with media processing (e.g. language translation)
- Transparent provisioning and configuration of RCS devices/clients
RCS Release 4 specifications are under development and primarily deal with how to implement APIs to allow interaction between RCS capable elements and social networks, creating an RCS community of sorts.
To date carriers in France, Spain and Italy have trialed RCS or have RCS trials in the works. While it is not clear that all aspects of RCS will be adopted, two things are likely. First - carriers will move messaging to a converged IP infrastructure and second, the success of these services will depend on providing interoperability between different generations of technologies as well as interoperability between carriers.