[asterisk-dev] SIP, NAT, security concerns, oh my!

Fri Oct 21 17:52:01 CDT 2011

Sorry in advance for the length of this message... I've intentionally 
included quite a lot of background information so that we can hopefully 
be able to discuss this issue rapidly and reach consensus.

Recently, a potential security issue was brought to our attention: it 
has existed in (probably) every version of Asterisk that included 
chan_sip. It's not something we'd classify as 'critical' or even 
'major', but it is a concern we want to address. Terry Wilson has spent 
some time investigating it, and then he and I spent some time over the 
last couple of days thinking about how (or if) it can be addressed.

The essence of the problem is this: it is possible under some 
circumstances for an attacker to be able to enumerate (discover) the 
names of SIP peers (and possibly users) defined on an Asterisk system. 
If they can discover the valid peer names, they can then focus 
password-guessing attacks on those names, which increases their chances 
of being able to gain access to the system. Generally speaking we've 
tried to make changes to Asterisk to remove this type of 'information 
disclosure' vulnerability, in order to help Asterisk users keep their 
systems as secure as possible.

Unfortunately in this case, we can't solve the problem without removing 
a useful feature of chan_sip. Here's why:

According to RFC 3261, when Asterisk (acting as UAS) receives a SIP 
request from another SIP entity (acting as UAC), any responses that 
Asterisk generates must be sent back to the IP address that the request 
was received from, but to the *port number* specified in the top-most 
Via header included in the request (this is important: they are *NOT* 
sent back to the port number the request was received from).

For most SIP clients, who are not sending their SIP requests across NAT 
devices, this is not a problem; they'll send their requests from port 
5060, and expect responses on port 5060 (and they'll either explicitly 
specify port 5060 in that top-most Via header, or they'll leave it out 
and let the RFC-specified default take effect). There could be some SIP 
clients out there who send their requests from one port (port A), but 
want to receive responses on a different port (port B). In this case, 
the top-most Via header will include port B, not port A. In our 
experience, this is extremely uncommon, but it is RFC compliant behavior.

When NAT devices enter the picture, though, things get more complicated 
(as they always do). Going back to the 'normal' SIP clients, that send 
requests from, and expect responses on, the same port... now they have a 
problem. If they send their request from port 5060, and expect responses 
on 5060, their top-most Via header will reflect that. However, when the 
request crosses the NAT on its way to the UAS, it will appear to have 
been sent from a different IP address and port number than the UAC sent 
it from (by definition... network address translation). Asterisk (the 
UAS) will respond back to the IP address that the NAT used to send the 
request, but it will *NOT* respond to the port number the NAT assigned; 
it will respond to the port number in the top-most Via header. Unless 
the NAT device just happened to assign the same port number (and some 
NAT devices will attempt to do that), or if the NAT administrator has 
setup a static mapping for that port number, the response will be 
lost... it will arrive at the NAT device, it won't match the port 
mapping established when the request passed through, and the NAT device 
won't forward it.

This dilemma was identified long ago (over 8 years), and an additional 
RFC was published: RFC 3581. This RFC allows the 'rport' parameter to be 
added to Via headers; this allows UACs who are aware that their requests 
may be crossing a NAT device (or even if they aren't aware, but just 
want to be as safe as possible) to indicate to the UAS that receives 
their request that the UAS should respond to the port that the request 
was received from, *NOT* the port listed in the top-most Via header. 
Some people would say that this is how SIP should have worked from the 
beginning, and that the extraction of *only* the port from the top-most 
Via header never made any sense at all (I would personally agree with 
those people), but history is what it is. In addition to this behavior 
change, the 'rport' parameter also indicates that the UAS should report 
back to the UAC the 'perceived' port number; this is useful, but is not 
part of the problem being discussed here.

Asterisk supports RFC 3581, and if a UAC includes 'rport' in its 
top-most Via header, Asterisk will indeed respond to both the IP address 
*and* port number that the request was received from.

However (and here's where we run into trouble), there are some (maybe 
many) SIP UAs out there that cannot (or just do not) include 'rport' in 
the top-most Via headers of their requests. Because of this, these UAs 
can be difficult to use behind a NAT device (unless the NAT is 
configured specially as outlined above), because Asterisk can't deliver 
responses to the UA (port number mismatch). In order to make these 
devices 'work', Asterisk gained a 'nat' configuration option many, many 
years ago that, if set to 'yes' (it defaults to 'no') tells chan_sip to 
act *AS IF* the UAC had included 'rport' in its top-most Via header, 
even if it didn't. This does in fact solve the problem; responses can 
now be delivered to the UAC, and the fact that Asterisk adds 'rport' to 
the Via header in the response is not harmful... the UAC just ignores it.

In later versions of Asterisk this configuration option gained some 
additional behavior (enabling 'connection-oriented media' mode), but 
again, that's not part of this issue. In recent versions, the value of 
this option can even be specified as 'force_rport' to more clearly 
indicate the desired behavior.

Now we get to the crux of the problem: this 'nat' option can be set both 
at the '[general]' level in sip.conf, and also for peers. This means, of 
course, that they values can differ (and frequently they will, but not 
because the administrator intended for them to differ). This means there 
are four possible combinations, with possible different behaviors. For 
the purposes of this discussion, let's assume that a UAC is sending a 
request (the request type does not matter) that does *NOT* include 
'rport' in its top-most Via header. In addition, the UAC is purposely 
sending its request from port A, but specifying port B in the top-most 
Via header. Let's also assume there is a peer called 'alice' defined in 
sip.conf.

Scenario 1: [general] nat=no, [alice] nat=no

No problem here; the behavior of Asterisk will be the same regardless of 
whether the request matches the 'alice' peer or not.

Scenario 2: [general] nat=yes, [alice] nat=yes

No problem here; the behavior of Asterisk will be the same regardless of 
whether the request matches the 'alice' peer or not.

Scenario 3: [general] nat=no, [alice] nat=yes

Problem here; if the request matches 'alice', Asterisk will respond to 
port A. If the request does not match 'alice', Asterisk will respond to 
port B.

Scenario 4: [general] nat=yes, [alice] nat=no

Problem here; if the request matches 'alice', Asterisk will respond to 
port B. If the request does not match 'alice', Asterisk will respond to 
port A.

As you can see, in both scenarios 3 and 4, if the attacker happens 
across a peer name (or source IP address) that happens to match a peer 
defined in sip.conf, and that peer's 'nat' setting differs from the 
general NAT setting, the attacker will be able to notice the difference 
in response pattern from when the request did not match any peer.

So now we know what the problem is, and what causes it. I'll offer up 
some possible solutions below, and ask for you all to think about this 
situation and help us decide on the right course of action. The eventual 
goal here is to produce a security advisory document telling users about 
the situation and what they can/should do to try to mitigate it; we 
don't expect that there is any solution that will solve the problem for 
all users.

Option 1:

Recommend that all users to switch to TCP (or TLS) for SIP 
communications. If they have a version of Asterisk that supports TCP, 
and devices that also support it, they can disable UDP support and avoid 
this problem entirely (since TCP does not have these NAT traversal 
issues to deal with).

Option 2:

Change Asterisk to always act in 'force_rport' mode, period 
(non-configurable). It is possible that there may be some SIP UAs that 
would break if Asterisk did not respond to the port listed in the 
top-most Via header, but it seems rather unlikely at this point. Such 
UAs would almost never be able to be used successfully behind a NAT 
device. In addition, it is remotely possible that there are some SIP UAs 
that will break if they see 'rport=<xxxx>' in the Via header returned by 
Asterisk in its responses.

Option 3:

Allow 'force_rport' mode to be disabled, but change the default to 
enable it, in all currently maintained branches (1.4, 1.6.2, 1.8, 10 and 
trunk). This has the same caveats as Option 2, but at least if a user 
*does* have one of these bizarre SIP UAs to deal with, they can set 
'nat=no' for that device after carefully considering the ramifications 
of the change.

There may be other options to consider, but Terry and I were unable to 
come up with any.

So, questions for the assembled audience here:

Do you have any other options for us to consider?

Are you aware of any SIP UAs that actually *REQUIRE* "nat=no" to 
interoperate with Asterisk?

Do you *always* set "nat=yes" on your SIP peer definitions? Do you also 
set "nat=yes" in the '[general]' section? If not, why not?

-- 
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
Jabber: kfleming at digium.com | SIP: kpfleming at digium.com | Skype: kpfleming
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at www.digium.com & www.asterisk.org