[asterisk-dev] func_odbc auto commit at answer time

Tue Feb 23 17:01:58 CST 2021

Hi,

I still think the changes for forcecommit should be done for func_odbc,
however, after various discussions we've decided to give the daemon
route a shot.

So just looking at the raw AMI event via UserEvent, this seems to have a
problem with quoting, for example:

Set(ODBC_trace()=CallerID: ${CALLERID(all)})
UserEvent(trace,Message: CallerID: ${CALLERID(all)})

The former in the DB ends up as:  CallerID: "name" <number>

The quotes are missing in the AMI event:

Message: CallerID name <number>

Is this a bug, or am I missing some escaping that needs to happen prior
to passing to UserEvent?

Kind Regards,
Jaco

On 2021/02/22 05:09, Jaco Kroon wrote:
>
> Hi,
>
> On 2021/02/21 13:46, Dennis Buteyn wrote:
>> On 2/19/21 1:03 PM, Jaco Kroon wrote:
>>
>> <snip>
>>
>>> The specific ODBC connection here is merely used for inserting "traces"
>>> into the database, so exactly this:
>>>
>>> [trace]
>>> dsn=ulsdb
>>> synopsis=Add a trace onto an existing call.
>>> syntax=[<level>]=<message>
>>> writesql=INSERT INTO calltrace(uniqueid, message, level)
>>> VALUES('${SQL_ESC(${UNIQUEID})}', '${SQL_ESC(${VALUE})}',
>>> '${IF($[${ARGC}>0]?${SQL_ESC(${ARG1})}:user)}')
>>>
>>> (Relies on patch mentioned in my point 1 for ARG1 to work correctly when
>>> not supplied)
>>>
>>> The table itself:
>>>
>>> CREATE TABLE `calltrace` (
>>>   `uniqueid` varchar(64) NOT NULL,
>>>   `level` enum('user','admin','system') NOT NULL DEFAULT 'user',
>>>   `tracetime` datetime(3) NOT NULL DEFAULT current_timestamp(3),
>>>   `message` varchar(1024) NOT NULL,
>>>   KEY `uniqueid` (`uniqueid`),
>>>   KEY `tracetime` (`tracetime`)
>>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1
>>>
>>> So we'll do stuff like Set(ODBC_trace(user)=Received call from
>>> ${CALLERID(all)} to ${EXTEN}) in the dialplan for critical decision
>>> points.  The other option is to log this to a file and to post-load it
>>> into the database.  Really doesn't matter that much, and if it goes
>>> lost, it's not the end of the world, but the above just makes it simpler
>>> and more real time, allows us to use concurrent inserts.
>>>
>>> For any given uniqueid it's seldom that there are more than 30 entries
>>> generated, around 20 or so on average, we currently dealt with this by
>>> reducing the amount of logging, so whilst not all of it is strictly
>>> required, it's useful stuff to be able to enable support personnel to
>>> not have to bug the system administrators, and even to users to not need
>>> to log stuff to support to begin with.
>>>
>>> The other func odbc queries are almost all select queries, and are very
>>> well cached, in critical cases into astdb too.  Those never cause
>>> issues, and even when the INSERTs are problematic continue to operate
>>> correctly.  The above basically just starts slowing down because of the
>>> implicit COMMIT, with the above as a transaction it works very well
>>> overall, the transaction connection just gets held up for the duration
>>> of the call, but it does lower the commit rate from 50-100 (and higher
>>> in extreme cases) to 5-10 average.  We can start seeing the effects of
>>> the forced disk IO around 75 odd COMMIT/s, and annoying slowdown beyond
>>> that.
>>>
>>> One do need to be aware that READ queries are issued on handles
>>> different than the WRITE queries, since the the _read function doesn't
>>> take res_odbc_transaction into consideration (which arguably it should
>>> but that has zero effect for my use case, in fact, as you point out it
>>> may actually be bad for my use case).
>>>
>>>> Without diving deep into transactions, it sounds like your dialplan is
>>>> doing lots of small modifications during the call. Databases like
>>>> doing things in bulk, so perhaps collecting changes in channel
>>>> variables before submitting them with a hangup handler will be more
>>>> effective?
>>> That's possible, or just group them into a transaction, it's not going
>>> to reduce the overall number of queries much, but yes, since this is
>>> MariaDB one could use the "bulk insert syntax" if we really wanted to
>>> (but that's not standard SQL and may well fail if for some reason
>>> someone some years from now decide we need to swap out MariaDB for
>>> postgres), but that also complicates the overall process significantly. 
>>> Our dialplan processing is usually well under a second in total, with
>>> average ring time on the Dial() around 12 seconds *usually* in ANSWER
>>> cases, and significantly shorter in NO ANSWER case.  This means that at
>>> a call setup rate of 10 calls/second we need a concurrency of 150 to
>>> have a slight safety factor, but if the COMMIT only happens when the
>>> channel gets destroyed (and average call duration of 73 seconds, or
>>> overall 85 seconds from channel creation to channel destruction, that
>>> goes up to 850 with little to no safety, 900-1000 with, and that's in
>>> both cases cutting it finer than we like).
>>>
>>> Don't think the discussion here is how to make the database
>>> better/faster, but rather focus on how asterisk can make more convenient
>>> mechanisms available in order to enable more effective and efficient use
>>> of the database engine, and by implication the hardware.
>>>
>>> You raise a valid point though:  what if the commit could take up
>>> significant (>100ms) amount of time?  Should we then have an option
>>> in/for res_odbc_transaction to fork a separate thread for COMMIT before
>>> returning the ODBC handle to the pool?  That actually makes some sense,
>>> so of the various 3.X options that involve res_odbc_transaction if there
>>> is then some way to "parallel commit", which would be fine for my use
>>> case, but may be terrible for others, eg, a sequence like:
>>>
>>> START TRANSACTION;
>>> INSERT INTO foo(a,b,c,d) VALUES(1,2,3,4);
>>> COMMIT; -- implicitly releases handle back to pool.
>>>
>>> SELECT * FROM foo WHERE a=1;
>>>
>>> If the COMMIT is parallel this opens a race towards the SELECT (which
>>> may end up using a different handle, and with default and sane isolation
>>> levels would result in a race condition even if both "transactions" are
>>> handled from the same dialplan "thread").  See above about func_odbc
>>> using different handles for transactional queries and read queries anyway.
>>
>> There's no way to really do bulk inserts without resorting to tricks
>> like "insert (a, b, c) values ($ARG1), ($ARG2), ($ARG3), ..." which
>> in practice is not very useful. I don't think func_odbc is a good fit
>> for what you are trying to do. As you've experienced, there is a
>> limit to how many individual insert queries one can make before a
>> database gives up (technically, the drives are the bottleneck here).
>> Transactions allow bundling many inserts together but due to the
>> length of most calls, keeping transactions open for the entire call
>> duration is not a good idea because of issues I mentioned previously.
>>
> Exactly why I'm proposing COMMIT AT ANSWER instead of COMMIT AT HANGUP.
>>
>> Most dialplan applications work synchronously. They have to,
>> func_odbc is no exception to that. Adding asynchronous capabilities
>> to func_odbc is interesting, but will add lots of complexity.
>> Asynchronous reads simply won't work because the dialplan needs those
>> values /now/, not later. Asynchronous writes, maybe. But who are you
>> going to notify when writes fail? The originating channel may already
>> be gone by then. And what if somewhere later on the dialplan tries to
>> read the values that were written asynchronously? Now you need some
>> kind of lock or barrier to make sure old values are not being read.
>> Things go down hill really fast.
>>
> I agree re the complexity, looking at concepts from cdr.c I believe
> this can be somewhat simplified, and I don't think we need the thread
> pool complexity, as far as I know spawning a thread is a fairly cheap
> operation, so I was thinking just spawn a thread per commit and be
> done with it.
>
> No, this isn't intended as a "solve all problems with func_odbc", just
> the keeping a transaction open for the duration of a call when we can
> commit at answer time.
>
> Regarding notifications, one could call back into the dialplan on
> async, but for now I'd simply document that if it fails it will be
> silent to the call, so use with care.  Having said that, I don't think
> I've ever seen "COMMIT" fail except on out of storage ... updates and
> inserts already generates row locks so usually the actual INSERT or
> UPDATE statement would fail (sometimes with something vague like
> deadlock ... and this can take many seconds, but the same applies even
> now).
>
> Async reads may actually be possible (and interesting actually), in
> many cases I may know that I will need data *later*, so I can initiate
> the lookup and later retrieve data via a handle (similar to the
> multirow lookup, in fact, if one were to use a cursor there instead of
> loading the entire resultset you effectively already have exactly
> that, except that you still need to wait for the first row of data,
> which means the server has already done most of the slow work
> anyway).  So in theory you could have func_odbc spawn the query in a
> separate thread and use a cursor to retrieve the data once it starts
> coming.
>
> Don't personally care about the read case currently and will not be
> implementing that.
>
>> Instead of making Asterisk perform all the heavy-lifting. What about
>> offloading to a dedicated logging process/server? You could issue an
>> AMI UserEvent() whenever you want to log something and have something
>> else pick up these events and write them to the database. AMI events
>> are always asynchronous, the dialplan will never stall out. The
>> logging process/server can take as long as it needs to write
>> everything to the database without affecting call performance.
>> Implementing bulk writes would be trivial (ie. write every 1000
>> events or every second, whichever comes first, as a single transaction).
>>
> Interesting and very workable concept.  I like this.  Will definitely
> take a look at this too.  We do have alternative use cases where we'd
> then need to juggle a hundred or so AMI connections in another daemon,
> along with matching each of those connections to it's own database, so
> that brings other complications (or running 100 of these extra logger
> daemons).
>
> Still think a "COMMIT AT ANSWER" time patch may be a good idea,
> especially in conjunction with an async commit.
>
> Kind Regards,
> Jaco
>
>>
>> -- 
>> Dennis Buteyn
>> Xorcom Ltd
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20210224/88440f45/attachment.html>