[asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (TC400B at fault ?)

Ex Vito ex.vitorino at gmail.com
Fri Jun 6 07:01:33 CDT 2008


  Hi list,

  Looking to share info and obtain peer feedback.
  Current possibilities: bad config, bad hw or asterisk/zaptel bug.

  System: HP Proliant DL380 G5
  Installed HW: TE220B to PSTN, TE122 to ChannelBank and TC400B.
  OS: CentOS 5.1 kernel 2.6.18-53.1.21.el5
  Asterisk: 1.4.20.1
  Zaptel: 1.4.11

  Events / History
  ---------------------
  May 29th
  - started production on the evening
  - TC400B was not on the system as it was not available by then
  June 4th
  - installed TC400B at the end of the day
  - test IAX/G.729 calls ok
  June 5th
  - 10.40: hang
  - 16.40: hang
  - 19.00: rebuild asterisk with DEBUG_LOCK + THREAD
  June 6th (today)
  - 11.35h: hang

  So, in short, after we installed the TC400B, the system appears to hang
  systematically. (which is really bad because we already had to RMA
  a TC400B twice for this system).

  Detail
  ---------
  When it hung about an hour ago we tested:
  - FXS @ channelbank @ TE122 => voicemail FAILS
  - FXS @ channelbank @ TE122 => PSTN @ TE220B WORKED ONCE
  - FXS @ channelbank @ TE122 => SIP phone FAILS
  - SIP phone => anywhere (voicemail, SIP, FXS @ channelbank, PSTN @ PRI) FAILS
  - PSTN @ TE220B => anywhere FAILS
  - IAX => anywhere FAILS

  asterisk log has 12 of:
ERROR [14733] chan_sip.c: We could NOT get the channel lock for
SIP/000e08dfdc72-0a107670!
ERROR[14733] chan_sip.c: SIP transaction failed:
43e5ad5f6dc5b58c46c597cd2af0c31e at 192.168.161.40

  ...followed by thousands of:
NOTICE[30599] chan_iax2.c: Avoiding IAX destroy deadlock

  (log contains similar messages for the yesterday hangs)

  asterisk CPU usage is apparently none
  load is at about 3

  network access to the system is ok
  dmesg kernel message buffer looks ok

  CLI core show locks shows lots of info which we're not able to
decode (attached)

  CLI stop now has no effect
  kill <pid> has no effect
  kill -9 <pid> leads to <zombie> process
  shutdown -r now leads to kernel panic probably while stopping zaptel because
  the TE122 and TE220B drivers were not unloaded (attached)

  In Our Heads
  ------------------
  - we're suspecting that the presence of the TC400B is making asterisk behave
    in different ways that lead to what we're now calling a hang (that
is the apparent
    change in the system since it started mis-behaving)
  - as such we're considering removing the TC400B to see if the system
stabilizes
  - however removing it may remove the possibility of further
diagnosing this issue
    and trying fixes
  - of course, we're trying to manage customer expectations and
satisfaction at the
    same time

  Extra Context Info
  ------------------------
  - system serves ~100 SIP extensions
  - system peers with a dozen other systems withing the VPN (dundi+iax)


  Thanks in advance for any feedback or pointer that can help us identify,
  workaround and, ideally, fix this behaviour.

  Cheers,
--
 exvito
-------------- next part --------------
A non-text attachment was scrubbed...
Name: summary-log.txt.gz
Type: application/x-gzip
Size: 7202 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-users/attachments/20080606/30aa0ef0/attachment.bin 


More information about the asterisk-users mailing list