[asterisk-dev] $1000 reward for patch to cure call pickup crashes

Alistair Cunningham acunningham at integrics.com
Thu Jul 28 16:57:32 CDT 2011


We've just had another case of this, and the customer called me before 
starting Asterisk. I connected strace to Asterisk, and didn't see very 
much happening, just lots of lines like this:

2503  14:36:24.187918 nanosleep({0, 1000}, NULL) = 0
2503  14:36:24.188055 nanosleep({0, 1000}, NULL) = 0
2503  14:36:24.188189 nanosleep({0, 1000}, NULL) = 0
2503  14:36:24.188324 nanosleep({0, 1000}, NULL) = 0
2503  14:36:24.188459 nanosleep({0, 1000}, NULL) = 0

with occasional:

2503  14:36:24.187288 nanosleep({0, 1000},  <unfinished ...>
1124  14:36:24.187393 <... poll resumed> ) = 1 ([{fd=16, revents=POLLIN}])
2503  14:36:24.187493 <... nanosleep resumed> NULL) = 0
1124  14:36:24.187532 gettimeofday( <unfinished ...>
2503  14:36:24.187565 nanosleep({0, 1000},  <unfinished ...>
1124  14:36:24.187604 <... gettimeofday resumed> {1311888984, 187555}, 
NULL) = 0
1124  14:36:24.187656 read(16,  <unfinished ...>
2503  14:36:24.187689 <... nanosleep resumed> NULL) = 0
1124  14:36:24.187719 <... read resumed> "\1\0\0\0\0\0\0\0", 8) = 8
2503  14:36:24.187765 nanosleep({0, 1000},  <unfinished ...>
1124  14:36:24.187805 poll([{fd=17, events=POLLIN}, {fd=16, 
events=POLLIN|POLLPRI}], 2, -1 <unfinished ...>
2503  14:36:24.187870 <... nanosleep resumed> NULL) = 0
2503  14:36:24.187918 nanosleep({0, 1000}, NULL) = 0

so it looks like Asterisk was sitting idle.

I ws able to connect to the Asterisk console with no problems, and that 
was fully responsive. However, Asterisk was not responding at all to SIP 
packets, and a "sipsak -vv -s sip:<local IP address>:5060 -d" on the 
same machine timed out. Doing a "/etc/init.d/asterisk restart" did not 
kill the Asterisk process, but a kill -9 did. Nothing useful was logged 
to /var/log/asterisk/*.

Does anyone know any debugging that could be added to Asterisk for the 
next time this happens?

Alistair Cunningham
+1 888 468 3111
+44 20 799 39 799
http://integrics.com/

On 26/07/11 13:49, Alistair Cunningham wrote:
> On 20/07/11 09:23, Alistair Cunningham wrote:
>> The customer running 1.8.5.0 has just decided to disable call pickup
>> rather than suffer more crashes, and they'll only be willing to
>> re-enable it once we're confident the problem is cured, so there won't
>> be any more debugging from them. If we find another customer affected
>> who's willing to debug, I'll let this list know.
>>
>> In the meantime, if anyone is confident that they can produce a patch
>> based on the reported symptoms and attached backtrace, please let me
>> know. I'm willing to offer a reward of USD 1000 via PayPal for a patch
>> that comprehensively solves the problem.
>
> After further testing, it appears that the problem only occurs when
> using the M() option to Dial() that ends up calling ChanPickup(). I
> don't know if this helps anyone narrow down the problem?
>
> Alistair Cunningham
> +1 888 468 3111
> +44 20 799 39 799
> http://integrics.com/



More information about the asterisk-dev mailing list