[asterisk-users] Asterisk freezes with Fixup failed on channel SIP/...<MASQ>

Grygoriy Dobrovolskyy megahohol at gmail.com
Sat Jan 24 08:20:17 CST 2009


Copy paste from freeswitch.org

Asterisk uses a modular design where a central core loads shared objects to
extend the functionality with bits of code known as "modules". Modules are
used to implement specific protocols such as SIP, add applications such as
custom IVRs and tie in other external interfaces such as the Manager
Interface. The core of Asterisk is a threading model but a very conservative
one. Only origination channels and channels executing an application have
threads. The B leg of any call operate only within the same thread as the A
leg and when something happens like a call transfer the channel must first
be transferred to a threaded mode which often times includes a practice
called channel masquerade, a process where all the internals of a channel
are torn from one dynamic memory object and placed into another. A practice
that was once described in the code comments as being "nasty". The same went
for the opposite operation the thread was discarded by cloning the channel
and letting the original hang-up which also required hacking the cdr
structure to avoid seeing it as a new call. One will often see 3 or 4
channels up for a single call during a call transfer because of this.

/* XXX This is a seriously wacked out operation. We're essentially putting
the guts of
the clone channel into the original channel. Start by killing off the
original
channel's backend. I'm not sure we're going to keep this function, because
while the features are nice, the cost is very high in terms of pure
nastiness. XXX */

This became the de facto way to pull a channel out of the grips of another
thread and the source of many headaches for application developers. This
uncertain threading scheme was one of the motivating factors for a rewrite.

Asterisk uses linked-lists to manage its open channels. A linked-list is a
series of dynamic memory chained together by using a structure that has a
pointer to its own type as one of the members allowing you to endlessly
chain objects and keep track of them.
They are indeed a useful programming practice but when used in a threaded
application become very difficult to manage. One must use mutexes, a kind of
traffic light for threads to make sure only 1 thread ever has write access
to the list or you risk one thread tearing a link out of a list while
another is traversing it. This also leads to horrible situations where one
thread may be destroying or masquerading a channel while another is
accessing it which will result in a Segmentation Fault which is a fatal
error in the program and causes it to instantly halt which, of course means
in most cases all your calls will be lost. We've all seen the infamous
"Avoiding initial deadlock" message which essentially is an attempt to lock
a channel 10 times and if still won't lock, just go ahead and forget about
the lock.


2009/1/24 Udo Schacht-Wiegand <asterisk at wiegand.name>

> On a production system, running 1.4.17 (compiled from
> bristuff-0.4.0-test6-xr1) we had this strange issue two times in the last
> weeks:
>
> [2009-01-13 13:58:30] WARNING[1213] channel.c: Fixup failed on channel
> SIP/2332-081d0108<MASQ>, strange things may happen.
> [2009-01-13 13:58:30] WARNING[1213] channel.c: Hangup failed!  Strange
> things may happen!
> [2009-01-13 13:58:30] WARNING[1213] channel.c: Failed to perform masquerade
> [2009-01-13 13:58:30] WARNING[1213] channel.c: Channel 'SIP/2332-081d0108'
> may not have been hung up properly
>
> and:
>
> [2009-01-23 14:27:17] WARNING[21528] channel.c: Fixup failed on channel
> SIP/2332-083c3778<MASQ>, strange things may happen.
> [2009-01-23 14:27:17] WARNING[21528] channel.c: Hangup failed!  Strange
> things may happen!
> [2009-01-23 14:27:17] WARNING[21528] channel.c: Failed to perform
> masquerade
> [2009-01-23 14:27:17] WARNING[21528] channel.c: Channel 'SIP/2332-083c3778'
> may not have been hung up properly
>
> Both times all SIP channels got stuck and the CLI became inresponsive.
> Calls continued for a while, but new SIP calls could not be
> established.
>
> On the second time this happended, all SIP phones could not subscribe to
> the Asterisk any longer and a few minutes later the log
> filled with:
>
> [2009-01-23 14:43:21] ERROR[22319] chan_sip.c: Call to peer '2333' rejected
> due to usage limit of 10
>
> On the CLI one could see, that there were 100s of (rejected) calls to this
> SIP phones.
>
> The phones that show up in the ERROR messages are in a group call made by a
> Dial(Local/...&Local.../&Local/...) construct. But other SIP phones were
> affected as well. It seemed like the whole chan_sip module
> became stuck. I also could not "unload chan_sip.so", but can't remeber the
> exact error message it gave.
>
> The only thing that was left was to restart Asterisk.
>
> Can someone give me some clue what the 'Fixup failed ...' and 'masquerade'
> warnings actually mean?
>
> Any help appreciated.
> Udo
>
>
>
> _______________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20090124/2d96b83e/attachment.htm 


More information about the asterisk-users mailing list