[asterisk-dev] [Code Review] A CDR Specification for Asterisk 12

Tue Mar 12 15:23:49 CDT 2013

On 03/12/2013 01:11 PM, David Kerr wrote:> Matt,
>   If you are messing with CDRs please take a look at bug id 20747 and
> the corresponding fix I posted to reviewboard (which has received no
> comments).  CDRs are not accurately recorded for outbound calls that go
> through SLA.  I understand the reasons why it is broken and partly it is
> to do with the implementation of SLA and partly it is just the nature of
> what SLA is in the first place.  I would very much like some review of
> this bug id and the fix on reviewboard.

The reason why no one has commented is because the patch is scary: SLA
and CDRs!

Yikes. :-)

I'll try to take a look at it on Thursday (our typical code review day),
but I have a feeling that it's going to have a lot of side effects that
are dangerous. Modifying CDRs in the the existing code base tends to
have that effect, no matter how hard anyone tries. This is part of the
reason why we have the "policy" to never alter CDR behavior in release
branches.

In reality, in the existing code base, the behavior of CDRs is not
defined for SLA - or really, for any multi-party scenario. While I'm
sure that feels like a bug, we've arrived there after many failed bug
fixes that only exacerbated problems. CEL was born of those efforts. The
only reason we are able to partially deal with multi-party scenarios in
Asterisk 12 is due to the Bridging Framework and Stasis-Core, as I'll
get to later.

> But for your CDR spec for Asterisk 12, you should at least document how
> it should work for extension(s) and trunk configured as SLA.  There are
> multiple scenarios...

So, thanks for reading the specification and providing feedback.
Unfortunately, most of this e-mail is going to be bad news.

Let's just start at the root of it: I didn't forget SLA. I explicitly
chose not to define the behavior for the current implementation of SLA.

Why not?

First, understand that the only reason we are looking at CDRs is because
we are forcing ourselves to. The migration of bridging to the Bridging
Framework necessitates updating of CDRs, as a lot of CDR behavior is
defined in masquerade callbacks and the bridging loops. Those
masquerades are now not going to occur as often, and the bridging loops
are going to get deleted wholesale. That means we either have to discard
CDRs completely, re-implement the logic in the Bridging Framework, or
find another way to disseminate Channel/Bridge information to the CDR
engine.

We're choosing the last option. Right now, however, the Bridging
Framework's threading model is undergoing changes by Richard (see the
bridge_construction branch) and Stasis-Core is getting implemented in
trunk (as well as a variety of other team branches). It's all still very
much a work in progress.

The idea however, goes as follows:

The Bridging Framework changes the nature of bridges in Asterisk by
promoting them to a first class object. You no longer have a collection
of channels in some shared state; you have a collection of channels
owned and managed by a well defined object. The difference is subtle,
but important: by having an object with its own state, we can naturally
track the lifetime of communication paths between channels. The fact
that transfers will (with very few exceptions) exist solely between
channels in bridges helps us to define the behavior of CDRs for those
scenarios. More importantly, Stasis-Core provides a means to disseminate
this state information through Asterisk as well as cache it. This lets
the CDR engine be built independently of the objects it tracks - so now
where CDRs previously had to live on a specific channel, they can simply
be their own entity tracking channels in bridges.

As such, without both the Bridging Framework and Stasis-Core together, I
would not touch CDRs. Even with both these components, this project is
going to be challenging.

So where does SLA fall short?

1) It is built on MeetMe, not the Bridging Framework. As such, it does
not have the same cohesive view of a Bridge and its Channels that Stasis
Core is relying on.
2) Its view of Bridging is unique. Unlike most other bridging concepts,
it relies upon channel roles extensively (station/trunk). This
complicates CDRs, as you allude to later.
3) We (as in the Asterisk developers here at Digium) are not migrating
SLA under the Bridging Framework for Asterisk 12. While this is doable -
and certainly a project to be encouraged - it is outside of the scope of
work that we have engineering resources for. Obviously folks in the
developer community could take that project on, in which case my next
point wouldn't be valid.
4) I'm not defining the behavior for things we aren't attempting to
tackle for Asterisk 12. Doing so would be disingenuous to Asterisk Users.

I understand that this is frustrating: it isn't fun to have a feature in
Asterisk that you use but doesn't receive a lot of attention. More on
possible avenues below.

> Party A connects to Asterisk SLA (hears a dialtone) and dials Party B
> (which connects or does not answer). Dialtone comes from either analog
> trunk or DISA() Asterisk application.
> Party A dials Party B (no intermediate dialtone from Asterisk, just
> "direct" through SLA "conference" to Party B
> Party A dials Party B (either of above two scenarios).  Party C joins in
> by pressing SLA trunk button on phone. Three way conference takes place.
>  Party C hangs up.  Party A later hangs up.
> Party A dials Party B.  Party C joins in by pressing SLA trunk button on
> phone. Three way conference takes place.  Party A hangs up.  Party C
> later hangs up.
> Last two scenarios with a Party D, E, F etc. joining and leaving.

Unfortunately, this is most likely already too complex. I'd take a
slightly different approach.

If I were to attempt to define the behavior (which I'm not), I would
attempt to boil it down to the simplest way of "viewing" the SLA use
case. For SLA, that's periods of communication between trunks on one
hand, and stations on the other. While that requires the CDR code to
have some concept of channel role and bridge type, that's not terrible
(we're going to have to do something similar with Parking).

This means that if you have a call come in from a trunk, and station A
answers it, you have one CDR. If station B also picks up the line, you
have two CDRs. Each CDR represents a path of communication between a
trunk and a station - so one for trunk -> A and one for trunk -> B.

Directionality shouldn't matter here: if A places a call out a trunk and
B barges in, there are still two CDRs (although the Party A may differ,
depending on the configuration). Whether or not trunks are always Party
A, stations are Party A, or the oldest is Party A should probably be
left up to the dialplan.

Alternatively, you could simplify this further and simply treat SLA as
one big multi-party bridge, and use the existing multi-party approach in
the CDR specification. This would be the easiest way to handle SLA, and
would defer the majority of billing logic to where it belongs: billing
engines.

> And there may be other scenarios as well.  In Asterisk 1.8 and 11 the
> CDR for Party A is frequently wrong... For example, in the first
> scenario Party A CDR is "answered" when receiving the dialtone (entering
> the SLA conference) not when Party B answers.  And in many cases when
> Party B answers a bridge/masquerade takes place that "ends" the CDR.
>
> The Asterisk 12 CDR spec should document expected behavior for SLA.  For
> example, should even the simplest of SLA calls generate two CDR
> records... one from Party A to SLA "conference" the other from the SLA
> conference to Party B.  This may sound strange, but when you have a
> Party C, D, E potentially joining and leaving the conference it may be
> the only way to accurately keep track of billing seconds.... and in the
> dialplan you can make use of NoCDR() etc. to control which records you
> actually want.
>

Well, I'm sure CDRs for SLA don't make much sense in the release
branches. I'm fairly confident that they were never intended to make
much sense.

However, I think you might be taking a slightly dangerous approach to
this problem. Attempting to make SLA a 'channel' in a CDR breaks the
general view CDRs have of the world. I would approach SLA as if it were
a bridge, not an application or a channel state in and of itself. The
endpoints involved should be the entities represented in the CDR - that
means the trunks and stations.

Finally, I would avoid trying to make Asterisk be in the business of
making billing decisions. I'd go so far as to say that NoCDR is an evil
that I'd prefer to delete but, unfortunately, probably can't. (If
nothing else, there's only so many times I can apologize for breaking
people's dialplans in a single day.) Asterisk should report the details
of communication between endpoints, but should defer billing problems to
a billing analysis engine. Billing engines should make the billing
business decisions, and they will be far better at it than Asterisk will be.

So, all of this being said, there are ways in which SLA can have CDR
behavior defined for it in Asterisk 12.
1) Ideally, someone would move SLA under the Bridging Framework. This is
a non-trivial project, but by doing so they will automatically gain
whatever Stasis-Core events exist under the framework. The updates to
the CDR engine to accomodate the bridge type and the channel roles won't
be simple, but would not be terribly difficult.
2) Less ideal, the Stasis-Core events could be replicated in the
existing SLA code. This is a temporary solution as it does not actually
move SLA under the common framework - thereby duplicating functionality
and creating complexity in the existing SLA code - but may be easier for
someone to approach.

However, I don't want to update the specification with behavior for SLA
unless someone is actually going to commit to doing the work for
Asterisk 12. Without someone taking on that project, there's no way to
commit to CDR behavior for it, and unfortunately, we aren't going to
have the bandwidth to tackle SLA ourselves.

Matt

-- 
Matthew Jordan
Digium, Inc. | Engineering Manager
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org