[asterisk-users] Simple CDRs

Fri Jan 9 09:08:52 CST 2009

On Fri, 2009-01-09 at 04:24 +0000, Grey Man wrote:
> On Fri, Jan 9, 2009 at 3:48 AM, Steve Murphy <murf at digium.com> wrote:
> >
> > But, since it is timestamp based, and unique in that the final part was
> > incremented per request in the same sec, it made a great item to sort
> > on, and allowed me to implement linkedID's.
> 
> Again that's mixing fields that shouldn't be. The calldate or
> starttime can be used to sort the CDRs on creation time. If you're
> going to call a field uniqueid surely a good effort should be made to
> make it live up to its name. If it's intended to be a sequenceid then
> that's what it should be called.

Sorry, I apologize for the 'uniqueID' field; I didn't invent it, or name
it, and there is little definition for it. I think it's accidental that
a transfer could yield two CDRs with the same uniqueID. I'm all for just
simply dropping it. Maybe I will.

LinkedID on the other hand, is a way to associate 'linked' channels.
A linkedID is guaranteed (on the same Asterisk server) to be unique
from any other linkedID that has been issued by that server.

By appending another string, you can guarantee it is unique across
systems.
So, if you use a system name, or Asterisk server name, that you yourself
guarantee to be unique among all the asterisk servers that would
contribute
CDRs to a database, you achieve complete, guaranteed uniqueness.

If you use instead a 36-byte UUID, you avoid having to invent the unique
system names, but the complete uniqueness guarantee is off. The chances
of a collision are pretty small, so they are 'practically' useful.
But still, to be mathematically precise about it, simple servernames
are infinitely more unique than UUIDs. 10^36 is a big number, but
infinity is even bigger. and the ratio between the two is.... infinity.

Now, as to splitting the currently per-server unique label from the
per-network part of the label, well, I'm dense, I know, but I still
don't
see any overriding reason to do so. Yeah, it might be cool to call the
uuid part of the label a uuid, but really, it's silly. Now you have to
use two fields to get uniqueness. Not cool. Just because a DB
understands
a UUID, doesn't mean you are obligated to separate it. Just because a 
language has a feature, you are not obligated to rewrite perfectly
working
code to use it, especially if it obfuscates what is going on, opens the
door to bugs, etc. etc.

I don't see any reason, either (and I could be missing something
really obvious here, I admit), for providing a field that *is* unique
to every CDR generated. When the CDR is entered into a table in the
db, it occupies a certain row in that database. That row number will
be guaranteed to be unique. Since nothing in the Asterisk world
references
that data once it's written, there would be little purpose in generating
it. If you need a unique ID that spans all CDR's ever written to that
db, across time and space, even if after a year or so, you prune all the
old CDR's from the DB, and allow the row numbers to be reused, then
*that*
might justify such an ID. (you could search backups or something for
that
entry, if some customer brings up a problem after a year)... 
But, if your DB understands uuid's, then can't you automatically
generate 
one for it via an entry in the table definition?

And, the linkedid already provided is timestamp based,
so across time, it will never be repeated.

So, I'm still not seeing any reason why you need to dissect the linkedid
field. How it was built doesn't matter, as long as in total, it has the 
necessary uniqueness property.

And here's another thought. I could imagine that, if we appended a
system
name to the linkedID to guarantee its uniqueness, we might also like 
to have the system name in the CDR, so we could make queries and see
if any of the Asterisk servers are busier than the others, etc. I
wouldn't
dissect the linkedID to get it; I'd add the system name as another
column.
But that's just me...

If I'm missing something, educate me. I need to understand this sort of
stuff to be able to write a useful CDR system.

> 
> > I guess I could simply have
> > used a simple integer for a linkedID, and had a routine somewhat like
> > the one that coughs up the uniqueID's, just use a lock to provide
> > a number that is incremented safely, (atomic_fetchadd_whatever)
> > and hand it out. Then I could have
> > used a numeric comparison. Might be faster. Maybe Someday...
> >
> > But, it is NOT a number. It's got a period in it but uuids usually
> > have dashes. It's a string. Why you would rip off a system name, I
> > cannot guess, unless you really want or need to deal with it as a
> > number.
> >
> > My advise is not to. I have no prob with uuids, except that they are
> > 36 bytes, and overkill for uniqueness. linkedID + system name would be
> > totally sufficient; One glance at the linkedID will tell you immediately
> > what sys it came from, if you did that.
> >
> > But it's quite legitimate to want to use UUID's. I have no idea how much
> > processing power they take to be generated, probably not much. There's
> > pros and cons...
> 
> I don't deal with UUIDs as a number or string I deal with them as a
> UUID (a UUID is a type in OO langauages). That's the point I'm trying
> to get across. Every modern high level language already has constructs
> to deal with UUIDs AND it's such a dead easy way to generate a unique
> id, it's one line of code. By making it a string or integer or
> something else you're making it harder for people to deal with in
> their billing engines and it's not actually making the field any more
> unique than it would be as just a UUID.
> 
> I don't understand why you want to combine server name and/or a
> timestamp with the unique id? If a user wants to know or sort on
> server name then why not provide a dedicated field in the CDR for
> that?

Well, the overriding reason here is that I need them to express 'age',
or
'order', which a uniqueID alone will not do. I need to be able to
compare two linkedID's and say one is less than the other.

> I'm not being anal here. I'm not the only one that has raised this and
> time and time again it's cropped up on the mailing list where users
> have been burnt because they relied on uniqueid being unique. Perhaps
> it would be worth soliciting views from the user list on what people's
> preference would be for the uniqueid field? Maybe there is a super
> efficient unique id generation approach that uses less than 36 bytes
> but what's going to cost the most: people's time in accommodating the
> super cleverness  in their many and varied systems (instead of a UUID
> that's well understood with existing constructs) or storing 36 bytes
> per CDR?

You raise an extemely valid point here. Yes, more than once, folks have
been burned by the fact that uniqueID isn't so unique.

So, I'm going to remove it from the spec. It's old baggage. LinkedID
will be the only such field. And it won't be unique per CDR. But it
will serve as a search key for groups of related CDRs.

Is there a need for a per-CDR unique identifier that cannot be
automatically
be generated by the database upon entry? (and the row number won't
suffice?)

murf

> 
> Regards,
> 
> Greyman.
-- 
Steve Murphy <murf at digium.com>
Digium
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3227 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-users/attachments/20090109/f665b629/attachment.bin