[Asterisk-Users] Dual T400P, SMP, performance issues

The Traveller traveler at xs4all.nl
Fri Jun 27 11:08:01 MST 2003


Heya all,

I ran some more tests with different kernel-options and my preliminary
conclusion is that the problem goes away when you disable SMP in your
kernel.  I even put the Eicon-card, which I suspected was causing the
problem, back into the machine and loaded it's drivers, making calls
through it during my stress-test, but the machine is still stable.
Hopefully, it stays that way.  These panics sometimes take a bit
longer to show up.

Be sure to keep the CPU-option to use the local APIC checked in your
kernel-configuration, or you'll only have the 16 lower IRQ's available,
with a high probability that your devices end up sharing IRQ's.  In my
case, both SCSI host-adapters, the ethernet, the Eicon-card and the
Zaptel-drivers suddenly all decided they wanted IRQ 5, even while there
where others available.  :-)

PS: I had uncommented the #define for SMP in the Zaptel Makefile while
testing with SMP and have now commented it again, while testing with a
non-SMP kernel, in case anyone wonders.


Alex: Could you try a non-SMP kernel on your machine as well and report
if it fixes your problem?  I'm using the standard GCC shipped with RH9,
which identifies itself as
"gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)", for all my tests.
The kernel I'm currently using is the v2.4.21 I started with, but with
SMP disabled.



   Grtz,

     Oliver

On Thu, Jun 26, 2003 at 12:27:12 -0500, Matthias Granberry wrote:

> Also, make sure that the kernel and all the modules are compiled with
> the same gcc version.  I had to manually change some hardcoded zaptel
> makefile targets to use gcc-2.95 instead of ${CC} or gcc hardcoded in.
> The entire asterisk build system is somewhat weak, but it seems to
> work if you tweak it all just right.  It's obvious what things the
> developers are interested in, though.  The boring parts are all
> half-done, and the interesting parts are all fairly high-quality.
> 
> Matthias
> 
> The Traveller <traveler at xs4all.nl> writes:
> 
> > Hi Alex,
> >
> > The problem is most likely to occur with high volumes of call-setups and
> > disconnects.  This could be reproduced by putting 2 of your T-1 ports
> > back to back and then using the auto-dialer to generate a large amount of
> > very short calls between the ports.
> >
> > I'm currently attempting to figure out what's causing the problem,
> > by trying different kernels with different options.  Trying a different
> > version of GCC is a good idea.  Didn't think of that yet.
> >
> > So far, I had limited success.  The panics popped up in all the kernels
> > I tested with, although some things, like some other hardware / drivers, seem
> > to make them more likely to appear.  See the other thread I started about
> > this problem.
> >
> >     Grtz,
> >
> >       Oliver
> >
> > On Tue, Jun 24, 2003 at 19:10:08 -0500, Alex Zarubin wrote:
> >
> >> Mark & Oliver,
> >> 
> >> It is too early to say, but the picture is different now. Our dual CPU,
> >> dual T400P box is up for 4 days, under the load of 10 - 100 simultaneous
> >> PRI -> SIP calls. We installed 2.4.21 #2 SMP (it was still freezing after
> >> that) and, what I think made the difference, recompiled
> >> zaptel-libpri-asterisk
> >> with gcc 3.3.
> >> 
> >> The problem, on the way, was that asterisk wouldn't start after that. It was
> >> crashing while loading mp3 and lpc10 codecs. We put 'noload' for these two
> >> into modules.conf - temporary solution, of course.
> >> 
> >> There are problems, still, with multiple connections at the same time.
> >> Windows
> >> to the box get frozen for a sec, D-channel error messages. The following
> >> messages are dumped into /var/log/messages. What do you think?
> >> 
> >> Jun 24 18:23:25 mspgate03 kernel:
> >> Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1:
> >> Jun 24 18:23:25 mspgate03 kernel: irq:  1 [ 0 0 1 0 ]
> >> Jun 24 18:23:25 mspgate03 kernel: bh:   0 [ 0 0 0 0 ]
> >> Jun 24 18:23:25 mspgate03 kernel: Stack dumps:
> >> Jun 24 18:23:25 mspgate03 kernel: CPU 0:02000000 0000036f 00e14603
> >> 18020000 03000010 00006647 008e0200 48030000
> >> Jun 24 18:23:25 mspgate03 kernel:        00000078 001ffa02 5b490300
> >> 06000000 000001c7 074e0308 00001afe 01c74d03
> >> Jun 24 18:23:25 mspgate03 kernel:        23020000 d7080000 e1000001
> >> 09000000 000001d7 f5030001 04000023 09300207
> >> Jun 24 18:23:25 mspgate03 kernel: Call Trace:    [<f89bd281>]
> >> [<f89bb132>] [<f89bbb47>] [<f89bd281>] [<f89bd281>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<f89bb132>] [<f89bd281>]
> >> [<f89bd281>] [<f89bb132>] [<f89bbb47>] [<f89e7737>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<f89aa80a>] [<f89aa80a>]
> >> [<c01feee4>] [<f89e7737>] [<c01f4eae>] [<c010a98e>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<c020d122>] [<c010abe3>]
> >> [<c020d122>] [<c020d550>] [<c010a98e>] [<c020d550>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<c010abfe>] [<c01f0919>]
> >> [<c01f0919>] [<c022a1ef>] [<c022a1ef>] [<c022a5f5>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<f89bd281>] [<f89bd281>]
> >> [<f89bd281>] [<f89bb132>] [<f89bd510>] [<f89e7737>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<c022a5f5>] [<c01f0ffd>]
> >> [<c01f112e>] [<c01f53c2>] [<c012005b>] [<c010abfe>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<c015147a>] [<c01509dc>]
> >> [<c0147460>] [<c0147fb8>] [<f89e7737>] [<f89e7737>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<c01f0998>] [<c01f0fac>]
> >> [<c01f112e>] [<c01f53c2>] [<c0117fce>] [<c0117ef0>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<c0144a64>] [<c01246db>]
> >> [<c0109023>]
> >> Jun 24 18:23:25 mspgate03 kernel:
> >> Jun 24 18:23:25 mspgate03 kernel: CPU 2:00000000 00000000 00000000
> >> 00000000 00000000 00000000 00000000 00000000
> >> Jun 24 18:23:25 mspgate03 kernel:        00000000 00000000 00000000
> >> 00000000 00000000 00000000 00000000 00000000
> >> Jun 24 18:23:25 mspgate03 kernel:        00000000 00000000 00000000
> >> 00000000 00000000 00000000 00000000 00000000
> >> Jun 24 18:23:25 mspgate03 kernel: Call Trace:
> >> Jun 24 18:23:25 mspgate03 kernel:
> >> Jun 24 18:23:25 mspgate03 kernel: CPU 3:00000070 cce30002 0cd80000
> >> 08fa0000 69530000 656c706d 6c616e41 73697379
> >> Jun 24 18:23:25 mspgate03 kernel:        0009a700 46534c00 65746e69
> >> 6c6f7072 32657461 6e655f61 0a810063 69530000
> >> Jun 24 18:23:25 mspgate03 kernel:        656c706d 65746e49 6c6f7072
> >> 4c657461 39004653 5300000b 6c706d69 66736c65
> >> Jun 24 18:23:25 mspgate03 kernel: Call Trace:
> >> Jun 24 18:23:25 mspgate03 kernel:
> >> Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 00000001
> >> 00000001 ffffffff 00000001 c010a7c2 c025c8ab
> >> Jun 24 18:23:25 mspgate03 kernel:        00000000 f2d92124 e14d5f00
> >> c0191104 00000500 00001805 000000bf 00008a01
> >> Jun 24 18:23:25 mspgate03 kernel:        7f1c0300 01000415 1a131100
> >> 170f1200 00000000 e14d4000 00000000 00000000
> >> Jun 24 18:23:25 mspgate03 kernel: Call Trace:    [<c010a7c2>]
> >> [<c0191104>] [<c01913d4>] [<c018e1e2>] [<c014c2c7>]
> >> Jun 24 18:23:25 mspgate03 kernel:   [<c0109023>]
> >> Jun 24 18:23:25 mspgate03 kernel:
> >> 
> >> Thank you.
> >> Alex Zarubin
> >> 
> >> -----Original Message-----
> >> From: The Traveller [mailto:traveler at xs4all.nl]
> >> Sent: Tuesday, June 17, 2003 3:10 PM
> >> To: asterisk-users at lists.digium.com
> >> Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues
> >> 
> >> 
> >> On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote:
> >> > 
> >> > BTW: As I reported in my previous mail to the list, I've now installed
> >> kernel
> >> > 2.4.21-rc2 with ACPI-patch on the box with the E100P.  I've been trying
> >> > very hard to reproduce a freeze with this kernel, but haven't succeeded
> >> yet.
> >> [...]
> >> 
> >> Ok, it crashed again, so that wasn't it either.  What I did to trigger
> >> it was using the auto-dialer to loop as many calls to app_datetime out
> >> and then back over the same E-1 as it would take, queueing the calls
> >> to "/var/spool/asterisk/outgoing/" 14 at a time.  It froze at the first
> >> attempt.  The "good" news is that it produced a visible kernel-panic.
> >> This time.  My guess is that you only don't see it if the console
> >> screensaver has already come on while it happens.
> >> 
> >> It read something like "Unable to handle kernel paging request" and
> >> happened in the swapper-task.  As usual, it dumped a lot of numbers on the
> >> screen, which I didn't want to write down.
> >> 
> >> Mark: If you want my help in debugging this, I'll hook it up to a
> >> serial console, trigger the crash and provide you with the exact
> >> panic, together with the ksyms and modules-info to trace it.
> >> 
> >> 
> >> 
> >>     Grtz,
> >> 
> >>        Oliver
> >> _______________________________________________
> >> Asterisk-Users mailing list
> >> Asterisk-Users at lists.digium.com
> >> http://lists.digium.com/mailman/listinfo/asterisk-users
> > _______________________________________________
> > Asterisk-Users mailing list
> > Asterisk-Users at lists.digium.com
> > http://lists.digium.com/mailman/listinfo/asterisk-users
> 
> -- 
> Matthias Granberry
> matthias at utdallas.edu
> (469) 371-0596
> _______________________________________________
> Asterisk-Users mailing list
> Asterisk-Users at lists.digium.com
> http://lists.digium.com/mailman/listinfo/asterisk-users
> 



More information about the asterisk-users mailing list