[asterisk-users] CEL entries over ODBC several hours late (Matthew Jordan)

Matthew Jordan mjordan at digium.com
Fri Dec 11 09:26:39 CST 2015


On Thu, Dec 10, 2015 at 8:57 AM, Stefan Viljoen <viljoens at verishare.co.za>
wrote:

> Hi Matthew
>
> Thank you very much for the reply.
>
> I must have something seriously wrong somewhere else then - I retested now
> and the "apparent" effect is as  I describe but your info definitely
> contradicts that. But you're obviously correct.
>
> One more question - I've noted that if I run a combination of queries in
> the
> CEL backing DB (MariaDB) and the CEL table is locked, this severely affects
> the Asterisk instance - thousands of occurrences of
>
> chan_sip.c:4057 __sip_autodestruct: Autodestruct on dialog
> '6a9f5d3543b619655e07c81437373a32 at 172.17.12.3:5060' with owner
> SIP/3034-000207c8 in place (Method: BYE). Rescheduling destruction for
> 10000
> ms
>
> appear in the CLI and users complain that if they hang up then they cannot
> make another call on the same SIP handset for several minutes.
>
> This is obviously because the dialplan gets delayed in the H extension, and
> cannot write to the CEL table, waiting for the MariaDB instance to clear
> the
> locks so it can write again. The above apparently comes from a watchdog
> process that watches how "fast" the H extension is and if it takes "too
> long" it forces the channels closed.
>
> Is this assumption correct?
>
> Addtionally, it seems that the writing of CEL in Asterisk is NOT async?
> E.g.
> it appears the thread that was running the conversation ALSO does the CEL
> writing / pushing to the CEL core as you describe in a synchronous manner.
>
> For 1.8, is this correct that CELs are synchronous, and do newer Asterisk
> versions do it async?
>
>
That's actually a bit surprising. Assuming you are using CELGenUserEvent in
the 'h' extension, I would not expect that to block the channel when
writing out to the database. The act of creating the CEL event will lock
the channel briefly, but the actual CEL event is queued up onto a message
bus and dispatched to another thread. I'd have to see the output of a gdb
backtrace or 'core show locks' to know why that is impacting the channels.

It should be asynchronous everywhere - in 1.8+. While the implementation of
the message bus changed in Asterisk 12, that doesn't change the nature of
how it is dispatched. As I said, a gdb backtrace or 'core show locks' would
show who the culprit it.


E. g. my point being if there are major DB issues, it is quite a bit
> kryptonite to have that "spill back" into Asterisk and start blocking off
> users from calling out - wouldn't it be much better to simply have failed
> CEL writes just die in a distal thread instead of the main call thread for
> the channels running on that handset.
>
> Or am I completely misunderstanding things?
>
> Anyway, thanks for the reply. :)
>
> Kind regards,
>
> > Hi guys
> >
> > I'm running 1.8.32.3 with CEL logging over ODBC to MariaDB 5.5.41 on the
> > same Centos 7 machine.
> >
> > I've noticed that the CDR entries made are all in-time, e. g. the call
> will
> > take place and the CDR entry is immediately written into the CDR table in
> > the MariaDB database.
> >
> > However, CEL events for that CDR (which I need to process for a realtime
> > display feature in my dialer software) are always several hours after the
> > fact. E. g. I will make a call at 09:30, see the call immediate pop up in
> > the MariaDB CDR table, but only at about 15:15 that afternoon will I see
> > that call's CEL events come into the CEL table, from Asterisk.... I have
> > examined the `show processlist` in MariaDB exhaustively to establish this
> > fact.
> >
> > The system doesn't appear loaded, load average is about 1.1 - it's a
> > quad-coare HT Intel Xeon E3-1225 with 8GB of DRAM running on an SSD for
> > main
> > storage.
> >
> > The system processes about 30 000 calls every 8 hour day, and services 90
> > SIP phones.
> >
> > I can stop and restart the MariaDB instance for several minutes, when I
> > restart it it immediately picks up on the "slow" CELs from where it was
> > interrupted - more evidence that Asterisk is very slowly streaming the
> CELs
> > through. I thought MariaDB was the bottleneck, but apparently not?
> >
> > If I make test inserts from a script into the CEL table, all of them
> > complete so quickly a time indication does not even register for the
> query
> > in MariaDB. Simple test queries on the CEL table are also instant, not
> even
> > counting in the internal MariaDB query duration timer.
> >
> > Can anybody explain why this is that the CELs asterisk emits over ODBC
> are
> > so delayed? Are CELs intended NOT to be realtime?
> >
> > So, logically, Asterisk appears to be caching CELs to the tune of
> hundreds
> > of thousands of them at any given time - meaning if it is stopped (either
> > killed, or core stop gracefully'ed, or just "core stop now")  potentially
> > hundreds of thousands of CELs will just evaporate irretrivably.
> >
> > What can I do to mitigate this extremely slow populating of CELs over
> ODBC?
> >
> >
> Asterisk does not buffer CEL entries. If anything, it pushes the entries
> out to ODBC much more aggressively than what you would get with CDRs.
>
> An event is generated in Asterisk that corresponds to the CEL entry. That
> entry is pushed over a message bus (the 'event' message bus in 1.8 - 11;
> 'stasis' in 12+) and is picked up by the CEL core. The events are
> immediately sent to the registered backends, who also immediately write it
> out to the backend they support. In the case of ODBC, this immediately does
> an INSERT into the appropriate table.
>
> In Asterisk 1.8, you can look for a verbose level 11 message that will show
> when this occurs:
>
>         ast_verb(11, "[%s]\n", ast_str_buffer(sql));
>
> In later versions, this was turned into a debug level 3 message (as
> anything over a verbose 5/debug 5 was cleaned up).
>
> If you see that message, then that will tell you when Asterisk *believes*
> it has written the CEL entry. If that doesn't show up in the database, then
> it is either in the ODBC driver or the Maria database.
>
> If you don't see that message, then something is preventing those events
> from getting delivered inside of Asterisk, which would only occur if you
> had some other serious call related issues occurring.
>
> Matt
>
>
>
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> New to Asterisk? Join us for a live introductory webinar every Thurs:
>                http://www.asterisk.org/hello
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-users
>



-- 
Matthew Jordan
Digium, Inc. | Director of Technology
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20151211/ce0281a2/attachment.html>


More information about the asterisk-users mailing list