[asterisk-bugs] [JIRA] (DAHLIN-339) Upgrade from 2.3.0.1 to 2.6.1 causes red alarms

Russ Meyerriecks (JIRA) noreply at issues.asterisk.org
Mon Jul 28 15:24:56 CDT 2014


Russ Meyerriecks created DAHLIN-339:
---------------------------------------

             Summary: Upgrade from 2.3.0.1 to 2.6.1 causes red alarms
                 Key: DAHLIN-339
                 URL: https://issues.asterisk.org/jira/browse/DAHLIN-339
             Project: DAHDI-Linux
          Issue Type: Bug
      Security Level: None
            Reporter: Russ Meyerriecks
            Assignee: Russ Meyerriecks


Created for spimental for the customer Unify:

(02:36:00 PM) spimental at digium.com: Hello hwdev, i have a customer issue that support can't help with and needs to be escalated. Can someone take a look at the problem Unify is experiencing?
(02:36:11 PM) spimental at digium.com: We had a system running Dahdi 2.3.0.1 with an specific BT (British Telecom) CO link configured with CAS-T1 as D4 framing. 
We saw some LOF errors activating DAHDI debug, but the link never went down. 

After upgrading the system to Dahdi 2.6.1, the LOF errors continued happening, but the links started bouncing, that means going to red and green state. 
It seems in Dahdi 2.6.1 the alarm generation is more sensitive to that. We want to release it to a very VIP and critical customer, but this will cause us problems since this new driver behaves differently from the older version. 

We saw in the code that there are some differences between alarm generation mechanism. It seems in 2.3.0.1 some counts are done and in 2.6.1 a temporization is done as below: 
2.3.0.1 
function static void t4_check_alarms(struct t4 *wc, int span) 
if (c & 0x20) { 
if (ts->alarmcount >= alarmdebounce) { 

/* Disable Slip Interrupts */ 
e = __t4_framer_in(wc, span, 0x17); 
__t4_framer_out(wc, span, 0x17, (e|0x03)); 

alarms |= DAHDI_ALARM_RED; 
} else { 
if (unlikely(debug && !ts->alarmcount)) { 
/* starting to debounce LOF/LFA */ 
printk(KERN_INFO "wct%dxxp: LOF/LFA detected " 
"on span %d but debouncing for %d ms\n", 
wc->numspans, span + 1, alarmdebounce); 
} 
ts->alarmcount++; 
} 
} else 
ts->alarmcount = 0; 


2.6.1 
/* Loss of Frame Alignment */ 
if (c & 0x20) { 
if (!ts->alarm_time) { 
if (unlikely(debug)) { 
/* starting to debounce LOF/LFA */ 
dev_info(&wc->dev->dev, "wct%dxxp: LOF/LFA " 
"detected on span %d but debouncing " 
"for %d ms\n", wc->numspans, span + 1, 
alarmdebounce); 
} 
ts->alarm_time = jiffies + 
msecs_to_jiffies(alarmdebounce); 
} else if (time_after(jiffies, ts->alarm_time)) { 
/* Disable Slip Interrupts */ 
e = __t4_framer_in(wc, span, 0x17); 
__t4_framer_out(wc, span, 0x17, (e|0x03)); 

alarms |= DAHDI_ALARM_RED; 
} 
} else { 
ts->alarm_time = 0; 
} 

We need to avoid this different behavior before releasing it to the customer. 
So, is it caused due to the code above or is it on other changes on the new driver code? 
Were there any specific cause for modifying the mechanism? Is there a secure way to use the old driver functionality regarding this issue? 
Note that the asterisk does not need to run. This is a driver level problem.

(02:43:20 PM) malcolmd: …for reference, Unify buys an obscene amount of product from us.  so this cannot get dropped. :D
(02:44:35 PM) rmeyerriecks: I'm aware, I've been getting quite the squeeze from them lately
(02:44:57 PM) malcolmd: unfun :(
(02:45:03 PM) sruffell: How have they been squeezing you? This is the first I've heard of any issues for them.
(02:45:55 PM) rmeyerriecks: They're the ones pressuring Channing to pressure me to have released 436 5 years ago
(02:46:11 PM) malcolmd: hah
(02:46:16 PM) sruffell: Ahh…
(02:46:41 PM) rmeyerriecks: spimental at digium.com: This is a te12x card?
(02:47:33 PM) rmeyerriecks: nope, must be a dual or quad
(02:54:47 PM) spimental at digium.com: rmeyerriecks: If you need any more details, logs, etc. Let me know and I will get them
(02:57:31 PM) rmeyerriecks: I never noticed before but that debounce time is 2.5 seconds
(02:58:47 PM) rmeyerriecks: In both versions
(03:03:57 PM) rmeyerriecks: Here is where it changed
(03:03:58 PM) rmeyerriecks: http://git.asterisk.org/gitweb/?p=dahdi/linux.git;a=commitdiff;h=2c341b481ebb7f5f0ed45da5a4ca23aba6780d6b
(03:18:06 PM) rmeyerriecks: spimental at digium.com: So to answer your questions. I don't think it's the code above that is affecting their behavior. They both keep the same timeout period.
(03:18:51 PM) rmeyerriecks: The specific cause for modifying that code they are pointing to was to improve performance of the driver and reduce cpu overhead. (It did a quite good job of it, as well)
(03:20:07 PM) rmeyerriecks: This was changed in 2.6, so 2.5 could possibly work for them



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list