[asterisk-ss7] Error in t1_timeout (arg=0x8180078) at
chan_ss7.c:765 makes asterisk crash
Anders Baekgaard
ab at sifira.com
Tue Mar 21 01:15:40 MST 2006
This change was one of a number of changes to fix propagation of hangup
causes. But this particular change is wrong. Will be changed in next version.
-Anderes
On Monday 20 March 2006 21:36, you wrote:
> > I just made some load testing and came across a problem. In some cases
> > with a high volume of calls ( > 40) all trying to start a connection
> > asterisk sometimes crashed without any output. I only got a coredump
> > when running it with asterisk -gvvvvvc (thanks to the hint from RoyK in
> > IRC) but this shows, that the problem seems to be found somewhere in
> > t1_timeout. This is what gdb says:
> >
> > (gdb) bt
> > #0 0x40861cf6 in t1_timeout (arg=0x8180078) at chan_ss7.c:765
> > #1 0x08056528 in ast_sched_runq (con=0x8148990) at sched.c:373
> > #2 0x40869643 in monitor_main (data=0x0) at chan_ss7.c:3402
> > #3 0x40027e51 in pthread_start_thread () from /lib/libpthread.so.0
> > #4 0x401ef92a in clone () from /lib/libc.so.6
>
> The crash is in this line, which was changed in version 0.8.3 of
> chan_ss7:
>
> @@ -737,7 +762,7 @@
> struct ss7_chan *pvt = arg;
>
> ast_log(LOG_NOTICE, "T1 timeout (waiting for RLC) CIC=%d.\n", pvt->cic);
> - isup_send_rel(pvt, pvt->hangupcause);
> + isup_send_rel(pvt, pvt->owner->hangupcause);
> return 1; /* Run us again the next period */
> }
>
> That change seems to me to be just plain wrong? There is no guarantee
> that pvt->owner will be non-null (and I'll bet that the crash is caused
> by it being NULL here).
>
> For example, if ss7_hangup() is called on an active curcuit, it sets
> pvt->owner = NULL, then calls initiate_release_circuit() which starts
> timer T1. If that timer triggers for some reason, Asterisk will crash on
> a NULL pointer reference.
>
> I would try to revert that change and see if it does not cure the
> crashes. Anders should be able to tell what the purpose of the change
> was and if anything else is needed.
>
> The next question is why the timer T1 triggered (this happens when the
> other end does not reply with "release confirmed" in a timely
> manner). But to answer that more information is needed, like a protocol
> dump (ss7 start dump ...).
>
> - Kristian.
More information about the asterisk-ss7
mailing list