[Asterisk-Dev] Deadlock?Thread create error.

tiszaii at t-online.hu tiszaii at t-online.hu
Wed Jan 11 03:21:58 MST 2006


 Hello,

on a Asterisk SVN-branch-1.2-r7406M production box we
noticed twice
system death (asterisk processes were running, so no core dump
available) during 3 days.

The call center was used at time of malfunction only for
outbound calls
(IAX2 softphones to PRI). Calls were recorded with monitor
application.

In the 1st failure I was able to log onto console (asterisk
-rv) but the
console stucked many times - this means, did not react for
anything
(running from screen).
In the 2nd case I wasn't able to log into console.
In both cases the asterisk process ate 98-99% of CPU and did
no any
usable work (we was not able to place calls). 'restart now'
command
didn't work. kill itself didn't work - just kill -9 at the end.

Some interesting things I found:
Dec 17 13:58:26 DEBUG[21182] channel.c: Avoiding initial
deadlock for
'Zap/32-1'
Dec 17 13:58:26 VERBOSE[7937] logger.c: -- Zap/32-1 answered
IAX2/3225-35
Dec 17 13:58:26 WARNING[21182] app_queue.c: Failed to create
update thread!
Dec 17 13:58:26 DEBUG[21182] channel.c: Avoiding initial
deadlock for
'IAX2/3225-35'
Dec 17 13:58:26 WARNING[21182] app_queue.c: Failed to create
update thread!
Dec 17 13:58:26 WARNING[21182] app_queue.c: Failed to create
update thread!

Last lines in messages log were:
Dec 17 13:58:30 WARNING[21182] app_queue.c: Failed to create
update thread!
Dec 17 13:58:30 DEBUG[7984] chan_iax2.c: We're hanging up
IAX2/3202-47
now...
Dec 17 13:58:30 DEBUG[7984] chan_iax2.c: Really destroying
IAX2/3202-47
now...
Dec 17 13:58:30 VERBOSE[7984] logger.c: -- Hungup 'IAX2/3202-47'
Dec 17 13:58:30 WARNING[21182] app_queue.c: Failed to create
update thread!
Dec 17 13:58:30 DEBUG[21188] chan_zap.c: Echo cancellation
already on
Dec 17 13:58:30 DEBUG[21182] channel.c: Avoiding initial
deadlock for
'Zap/35-1'
Dec 17 13:58:30 VERBOSE[7947] logger.c: -- Zap/35-1 answered
IAX2/3224-43
Dec 17 13:58:30 WARNING[21182] app_queue.c: Failed to create
update thread!
Dec 17 13:58:30 WARNING[21182] app_queue.c: Failed to create
update thread!
Dec 17 13:58:30 DEBUG[21185] chan_iax2.c: chan_iax2:
ast_sched_runq ran
772 scheduled tasks all at once
Dec 17 13:58:35 DEBUG[21188] chan_zap.c: Echo cancellation
already on
Dec 17 13:58:35 DEBUG[21182] channel.c: Avoiding initial
deadlock for
'Zap/46-1'
Dec 17 13:58:35 VERBOSE[7967] logger.c: -- Zap/46-1 answered
IAX2/3235-44
Dec 17 13:58:35 WARNING[21182] app_queue.c: Failed to create
update thread!
Dec 17 13:58:37 DEBUG[7458] channel.c: Didn't get a frame
from channel:
Zap/43-1
Dec 17 13:58:37 DEBUG[7458] channel.c: Bridge stops bridging
channels
IAX2/3267-31 and Zap/43-1

In logs I noticed many times:
DEBUG[21182] channel.c: Avoiding initial deadlock for 'Zap/36-1'

What can this mean?
I checked app_queue and saw that the given message came from
a method
which intended to handle device state changes and it had to
issue
another thread to make the work (in app_queue). I have three
things in
my mind:
1. there is something wrong with device state handling - not
related to
app_queue
2. there is something wrong in system not allowing to start
new thread -
not enough resources? I doubt.pl. the system-imposed limit
on the total number of threads in a process
PTHREAD_THREADS_MAX would be exceeded, because the more
"changethread()" thread functions locked, and there are not
exit.
3. something wrong with app_queue

System specs:
- Asterisk 1.2 SVN branch with some small modifications,
patches.
- Linux-2.6.13.2, Ubuntu 5.04
- SuperMicro Dual Xeon 2.8GHz (HT disabled), 2GB RAM, HW
SATA RAID5
- 2xDigium TE110P

What can I do in such case? That's a production box, we have
a very
similar box (just the motherboard differs a bit) with
exactly the same
asterisk version, kernel, configuration and didn't notice
the problem at
all. We are restarting asterisk after midnight every day.

Any idea what do I have to check in configuration? What to
try if the
problem occurs again, how to help you, the developers to
catch the
source of the issue (maybe a bug?)?
May be,similary bug:
http://bugs.digium.com/file_download.php?file_id=8789&type=bug
[^]
"Thread 189 (Thread -1222902864 (LWP 19916)):
0 0xb7f8c436 in __lll_mutex_lock_wait () from
/lib/tls/libpthread.so.0
1 0xb7f8989f in _L_mutex_lock_73 () from
/lib/tls/libpthread.so.0
2 0xb71b9050 in ?? ()
3 0xb71b8a50 in ?? ()
4 0xb71b8758 in ?? ()
0000005 0xb7607d6d in hangupcalls (outgoing=0xb7616238,
exception=0xb7616738) at app_queue.c:1249
0000006 0xb7611208 in ast_mutex_lock (pmutex=0xb7616238) at
lock.h:584"

I think the this is the first deadlock, and after bad the
system.
Interesting thing is:
outgoing=0xb7616238 pointer = (pmutex=0xb7616238) !!!?

best regards
Istvan



More information about the asterisk-dev mailing list