[asterisk-ss7] Error in t1_timeout (arg=0x8180078) at chan_ss7.c:765 makes asterisk crash

Anders Baekgaard ab at sifira.com
Tue Mar 21 01:15:40 MST 2006


This change was one of a number of changes to fix propagation of hangup 
causes. But this particular change is wrong. Will be changed in next version.

-Anderes

On Monday 20 March 2006 21:36, you wrote:
> > I just made some load testing and came across a problem. In some cases
> > with a high volume of calls ( > 40) all trying to start a connection
> > asterisk sometimes crashed without any output. I only got a coredump
> > when running it with asterisk -gvvvvvc (thanks to the hint from RoyK in
> > IRC) but this shows, that the problem seems to be found somewhere in
> > t1_timeout. This is what gdb says:
> >
> > (gdb) bt
> > #0  0x40861cf6 in t1_timeout (arg=0x8180078) at chan_ss7.c:765
> > #1  0x08056528 in ast_sched_runq (con=0x8148990) at sched.c:373
> > #2  0x40869643 in monitor_main (data=0x0) at chan_ss7.c:3402
> > #3  0x40027e51 in pthread_start_thread () from /lib/libpthread.so.0
> > #4  0x401ef92a in clone () from /lib/libc.so.6
>
> The crash is in this line, which was changed in version 0.8.3 of
> chan_ss7:
>
> @@ -737,7 +762,7 @@
>    struct ss7_chan *pvt = arg;
>
>    ast_log(LOG_NOTICE, "T1 timeout (waiting for RLC) CIC=%d.\n", pvt->cic);
> -  isup_send_rel(pvt, pvt->hangupcause);
> +  isup_send_rel(pvt, pvt->owner->hangupcause);
>    return 1;                     /* Run us again the next period */
>  }
>
> That change seems to me to be just plain wrong? There is no guarantee
> that pvt->owner will be non-null (and I'll bet that the crash is caused
> by it being NULL here).
>
> For example, if ss7_hangup() is called on an active curcuit, it sets
> pvt->owner = NULL, then calls initiate_release_circuit() which starts
> timer T1. If that timer triggers for some reason, Asterisk will crash on
> a NULL pointer reference.
>
> I would try to revert that change and see if it does not cure the
> crashes. Anders should be able to tell what the purpose of the change
> was and if anything else is needed.
>
> The next question is why the timer T1 triggered (this happens when the
> other end does not reply with "release confirmed" in a timely
> manner). But to answer that more information is needed, like a protocol
> dump (ss7 start dump ...).
>
>  - Kristian.


More information about the asterisk-ss7 mailing list