[asterisk-users] Issue with PJSIP contacts being "unavailable"
asterisk at phreaknet.org
asterisk at phreaknet.org
Tue Jun 27 06:21:52 CDT 2023
I've been having a serious issue the past couple weeks where many users'
devices show up as "Unavailable" according to PJSIP. The underlying
issue is that res_pjsip thinks there are no available contacts for the
device, and in the normal course of operation, even as it's chatting
back and forth with the device, it thinks the contact should stay
"Unavailable". This is particularly problematic because I rely on
accurate device state, and consequently users are not receiving calls.
The issue seems to be a combination of chan_pjsip + particular device.
Users with a line on a system using chan_sip do not have any issues, and
everything works fine (i.e. one line is working fine, the other is
broken in this manner). Likewise, there are other devices that seem to
work fine and don't have this issue. Consequently, it seems like this
should be fixable by changing either something on the ATAs *or*
something on the Asterisk side (and maybe both, for good measure). All
units are provisioned more or less identically, and I've ruled out
firmware version as being a factor in this particular case (here, I'm
focused on Grandstream HT 802s in particular, since that's a majority of
both devices in the field, and devices with problems right now).
Capturing some debug logs, I thought this might be related to not
receiving OPTIONS responses from the endpoint, though it doesn't seem to
be consistent. At times I see that I'm receiving a response, though in
the example below, I'm not. However, it's consistently unavailable. I
looked into this about a year ago when I noticed this issue (though at
the time, it wasn't impacting many devices) and came to roughly the same
conclusion. Only now though has this seemed to have a wide impact for a
prolonged period of time.
I will say that this is an issue that seems to crop up now and then with
chan_pjsip, and I've been seeing this type of thing occasionally for
years now, where users won't receive calls, and when I run "pjsip show
endpoint XXX", it says "Unavailable", no available contacts, even though
the device is registered. I know that chan_pjsip doesn't use
registrations at all to determine device state availability (maybe
chan_sip does and that's why it works more reliably, not entirely
sure... I'd think that a REGISTER alone ought to be sufficient to toggle
the device state from "unavailable" to "not in use" or something else -
at the very least, it would be a failsafe that would lead to device
state being less buggy, since if a device just registered, it's clearly
not unavailable). Currently there seem to be OPTIONS every 30 seconds,
and we have quite a low REGISTER interval as well. No improvement
changing the OPTIONS/keepalive settings on the ATAs though.
Apart from this being disruptive right now, it's also been a blocker for
other chan_sip migrations due to the severity of the issue, and I'd
really like to figure out how it could be resolved or mitigated, so any
insight would be much appreciated - thanks!
Trace from an "unavailable" ATA (not working correctly):
https://paste.interlinked.us/iz07sapwrb.txt
Trace from an "available" ATA (working correctly):
https://paste.interlinked.us/ocutyjslmg.txt
More information about the asterisk-users
mailing list