[Asterisk-Users] Centos 4.3 Issues

Greg Boehnlein damin at nacs.net
Mon May 22 09:16:56 MST 2006


Hello,
	I was wondering if anyone out there is successfully running 
Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two 
weeks that has me scratching my head and muttering strange things in the 
wee hours of the morning. I am going to try and be as descriptive as my 
brain will allow right now, but if there is something that I do not cover, 
please do not hesitate to ask and I'll be happy to answer.

	For the last 2 years, I have been running a mixture of Tao Linux 
and Centos (both RHEL derivatives) on our production boxes. Asterisk has 
run flawlessly on all installations. Last week, I updated one of our 
gateway boxes from Centos 4.2 (under which it ran for 6 months without 
issue) to the new 4.3 code. Almost immediately, we began to experience 
problems. Asterisk would core w/ the following:

#0  0x004878ab in test_err () from 
/usr/lib/asterisk/modules/codec_g729a.so

	The segfaults would happen under very light loads, in some cases 
with just a single call. Kevin was able to log in to the box, and put a 
debugging version of codec_g729 on the box. He determined that the problem 
was that the values that were being returned in that routine were 
incorrect. I.E. something in the system was returning a non-zero value 
when multiplying a number by "0". Barring any other explanations, we 
assumed that there was a hardware issue somewhere, either in the memory, 
or the FPU on the CPU.
	So, we replaced the box w/ a brand new Dual-Core system running a 
Dual-Core Pentium D 920. We loaded the 32 bit version of Centos 4.3 onto 
the box and proceeded to start testing. BAM.. same problem.. the backtrace 
showed the failure in the same routine.
	We scratched our heads, and after many hours of trying various 
things (backing off the kernel to 2.6.9-22) and even moving to the new 
development kernel 2.6.9-34.19 (from the testing tree) we could do nothing 
to solve the issue.
	Mind you, this is the exact same behavior on two different 
hardware platforms running the exact same distribution. We even loaded up 
a third box and could reproduce the behavior on it as well. Three 
different boxes, one common distribution.

	As a test, we installed Fedora Core 5 x86_64 on the new Dual Core 
box and ran extensive tests overnight, simulating 96 channels doing G729 
to Ulaw transcoding. The box ran completely stable. No hiccups.

	So, this morning, we put it back into the cluster, and it's now 
taking about 200 concurrent calls, doing an insane amount of transcoding 
and it is working just fine. Before, it would have cored in the first 
couple of minutes.

I'm scratching my head here, because I generally have had excellent 
experiences with Centos. However, I have NO idea what might be the issue 
here. Could it be the kernel? (We tried three different ones!). Could it 
be the libc? Maybe it is the compiler?

In any case, if anyone is having success with Centos 4.3 (32 bit), please 
speak up. I'd like to get to the bottom of it. I generally do not like to 
run Fedora on production equipment as it is generally bleeding edge. In 
this case, FC5 is running 2.6.16 something..

-- 
    Vice President of N2Net, a New Age Consulting Service, Inc. Company
         http://www.n2net.net Where everything clicks into place!
                             KP-216-121-ST





More information about the asterisk-users mailing list