[Asterisk-Users] Centos 4.3 Issues
Greg Boehnlein
damin at nacs.net
Mon May 22 09:16:56 MST 2006
Hello,
I was wondering if anyone out there is successfully running
Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two
weeks that has me scratching my head and muttering strange things in the
wee hours of the morning. I am going to try and be as descriptive as my
brain will allow right now, but if there is something that I do not cover,
please do not hesitate to ask and I'll be happy to answer.
For the last 2 years, I have been running a mixture of Tao Linux
and Centos (both RHEL derivatives) on our production boxes. Asterisk has
run flawlessly on all installations. Last week, I updated one of our
gateway boxes from Centos 4.2 (under which it ran for 6 months without
issue) to the new 4.3 code. Almost immediately, we began to experience
problems. Asterisk would core w/ the following:
#0 0x004878ab in test_err () from
/usr/lib/asterisk/modules/codec_g729a.so
The segfaults would happen under very light loads, in some cases
with just a single call. Kevin was able to log in to the box, and put a
debugging version of codec_g729 on the box. He determined that the problem
was that the values that were being returned in that routine were
incorrect. I.E. something in the system was returning a non-zero value
when multiplying a number by "0". Barring any other explanations, we
assumed that there was a hardware issue somewhere, either in the memory,
or the FPU on the CPU.
So, we replaced the box w/ a brand new Dual-Core system running a
Dual-Core Pentium D 920. We loaded the 32 bit version of Centos 4.3 onto
the box and proceeded to start testing. BAM.. same problem.. the backtrace
showed the failure in the same routine.
We scratched our heads, and after many hours of trying various
things (backing off the kernel to 2.6.9-22) and even moving to the new
development kernel 2.6.9-34.19 (from the testing tree) we could do nothing
to solve the issue.
Mind you, this is the exact same behavior on two different
hardware platforms running the exact same distribution. We even loaded up
a third box and could reproduce the behavior on it as well. Three
different boxes, one common distribution.
As a test, we installed Fedora Core 5 x86_64 on the new Dual Core
box and ran extensive tests overnight, simulating 96 channels doing G729
to Ulaw transcoding. The box ran completely stable. No hiccups.
So, this morning, we put it back into the cluster, and it's now
taking about 200 concurrent calls, doing an insane amount of transcoding
and it is working just fine. Before, it would have cored in the first
couple of minutes.
I'm scratching my head here, because I generally have had excellent
experiences with Centos. However, I have NO idea what might be the issue
here. Could it be the kernel? (We tried three different ones!). Could it
be the libc? Maybe it is the compiler?
In any case, if anyone is having success with Centos 4.3 (32 bit), please
speak up. I'd like to get to the bottom of it. I generally do not like to
run Fedora on production equipment as it is generally bleeding edge. In
this case, FC5 is running 2.6.16 something..
--
Vice President of N2Net, a New Age Consulting Service, Inc. Company
http://www.n2net.net Where everything clicks into place!
KP-216-121-ST
More information about the asterisk-users
mailing list