[asterisk-dev] Time for a bug fix phase?

Steve Murphy murf at digium.com
Tue May 27 23:35:24 CDT 2008

On Tue, 2008-05-27 at 17:39 -0500, John Lange wrote:
> I just want to put my thoughts out there regarding the current state of
> the code base specifically related to a couple of bugs I've been having
> problems with.
> They are:
> Call Logging:
> http://bugs.digium.com/view.php?id=10052 (June 07)
> BLF/Line status:
> http://bugs.digium.com/view.php?id=0011093 (Oct 07)


You have the categories and bug numbers reversed. 10052 is the BLF
problem with SNOM 360's; 11093 is the CDR logging problem.

> It seems to me that these are both pretty significant bugs that have
> been hanging around for a long time. I personally didn't start to
> encounter them until recently when I was finally convinced to upgrade
> some clients to 1.4.x.
> >From the prospective of our clients the inability of Asterisk to show
> the (BLF) status of a phone, or the inability to know what calls came
> into the system due to broken logging makes the system seem
> "immature" (as one client put it).
> I realize that the urgency of any given bug is highly dependent on how
> it directly effects any given person and I'm not trying to impose my
> pain points on others. I simply want to express my concern that there is
> about to be another new major version (1.6) that contains significant
> known bugs that have been around for a while. I listed the two which are
> bothering me the most at the moment but I'm sure there are others.
(I can only speak for 11093; 10052 is assigned to OEJ)

Well, I'm not happy you are having problems, but I am happy that
11093 is a high priority for you. That means that, maybe, I might
be able to interest you in helping me out when I dive back into CDRs.
(which is now; I'm trying to satisfy the ForkCDR bug reports right now,
then I'll be hacking at the xfer CDR's again.

Let me tell you what I'm up against. 11093 was the first intimation
that ANYONE ever gave, of what they really wanted from CDR's in xfer
situations. Really. Most other reports just said, hey, I did an xfer,
and the second half isn't covered!

Well, the CDR system was implemented before some other nifty features
of Asterisk came into being. Since that time, whole capabilities were
borne into Asterisk, but CDR's continued on undisturbed.

Some interesting things happen when channels are masqueraded,
Local channels are created, and bridged. The CDR code was written
with simple concepts and simple requirements, and worked great in 
those simple situations. But as the new capabilities were added,
the situation was no longer anything near simple in the lower
layers of Asterisk. 

For instance, the concept of "channel-peer" in many of the routines 
gets fuzzy when channel and peer trade places. In some cases, 
CDR information couldn't even be stored, because traditionally, 
the peer channel didn't even get a CDR assigned for storage.
Swapping CDRs didn't work either; some events happened
on one channel, but not on the other.

To keep whole legs from being lost, I switched to a system 
whereby I tracked the bridges (connections between channels).
At least then, I could insure every bridge had a CDR generated.
But there are bugs in that too. The work I did in the CDRfix5
branch proved to me that you have to go down into the channel 
drivers and play with CDR fields if you are going to track 
some xfers. Fun.

The whole thing, and the stupendous amounts of time it was taking me,
and the large number of possible xfer scenarios, have led me
to the conclusion that the current CDR system is the wrong way
to handle it. But, it's here, and I'll try in the coming days to
get some of my fixes committed to trunk, mayhaps. No one will ever
be completely satisfied with my approach. I can guarantee it.

Now, joining call legs in some predictable way may or may not be 
entirely possible. But, we can try. I've got a whole notebook section
filled with timing diagrams exactly specifying the individual events
that occur during several different xfer combinations.

Another thing I might be able to guarantee, is that, should you go thru
all those scenarios in my notebook, and exactly specify how you want the
CDRs to be generated for each scenario, and what things should appear
in what fields for each CDR, and if I could exactly fulfil your every
wish, that some other segment of the user community would be very
unhappy with the results, and claim the whole thing is entirely 

I'm starting to get a sense for the true majesty of the size of this 
problem, and some general principles. I really feel that CEL will 
be a better solution for customizing CDR output than the 
current system.

CEL allows you to track the events that contribute to the formation
of CDR's, kinda like the Manager events. It uses Russell's  nifty 
event system, which will allow event tracking over multiple machines.
Brian Degenhardt is playing with it now, and it appears that soon he
will write a CEL to CDR converter, that will track things the way
he needs them tracked. (It is hoped that his code will suffice for
a large number of users, and serve as a platform for customization
for the rest). (He liked my "linkedID" idea I introduced in
the CDRfix5 branch, and is elaborating and extending it now.)

The real magic of capturing call events using CEL is in contrast to the
real pain of CDR's in complicated situations. It is very possible that
you want certain xfer situations stated in a certain way. But CDR's 
will probably never fully satisfy you. They are meant to be only
minorly, subservient user and dialplan demands. They basically have all
policy and decisions hardwired in. You have to suffer with it, however
is. Sure, you can play with the available apps, like NoCDR, ForkCDR,
ResetCDR, etc, and try to force it to do your will. But in the end, 
when two similar event types have to compressed into one, you will lose
one of them. (and, most likely, the wrong one.) I split the 
legs into separate pieces/CDRs for that reason, to eliminate data 
loss. Yes, it will break/has broken your implementations. But your 
implementations are inherently broken already. Xfers in 1.2 and
early 1.4 were full of huge problems. There's still some holes to fill,
and CDR's to straighten out, which I will try to do in the coming days
as I try to merge in the fixes I made to the CDRfix5 branch, so the
fixes aren't lost entirely.

But I think I'm going to have to draw a line at where to stop, and how
much of Digium's time is going to be invested into the whole setup.
Actually, I'm willing to spend as much time as Digium will allow,
and that can be determined by the user community, to a degree.

So, I encourage you to exhort, shame, embarrass, etc., us into
making fixes, but please, along the way, help us where you can. You'll
be happier with the results.

> I expressed similar concern with 1.4 before it was "forced" out the door
> and the resulting instability caused the uptake of 1.4 to be very slow.
> As I understand it this is a problem which we don't want repeated in
> 1.6.

Hey, we tried and are very much still trying to make Asterisk as
as possible. The number of open and critical bugs are way down from
year. (Just the really nasty, sticky, venomous bugs are taunting us,
and we are trying to attack them. Over the last years, Digium has hired
a team to help out in fixing bugs.... But things go faster if the
community participates!

> Unfortunately I'm not much of a C programmer and I don't have much time
> for development so the patches I've contributed so far are pretty basic
> but never the less I'm going to try and dig into at the logging issue to
> see if I can't get something working.

Excellent. I look forward to working with you in any which way you are


Steve Murphy
Software Developer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3227 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20080527/6f99805d/attachment-0001.bin 

More information about the asterisk-dev mailing list