[asterisk-dev] func_odbc auto commit at answer time

Sun Feb 21 21:09:24 CST 2021

Hi,

On 2021/02/21 13:46, Dennis Buteyn wrote:
> On 2/19/21 1:03 PM, Jaco Kroon wrote:
>
> <snip>
>
>> The specific ODBC connection here is merely used for inserting "traces"
>> into the database, so exactly this:
>>
>> [trace]
>> dsn=ulsdb
>> synopsis=Add a trace onto an existing call.
>> syntax=[<level>]=<message>
>> writesql=INSERT INTO calltrace(uniqueid, message, level)
>> VALUES('${SQL_ESC(${UNIQUEID})}', '${SQL_ESC(${VALUE})}',
>> '${IF($[${ARGC}>0]?${SQL_ESC(${ARG1})}:user)}')
>>
>> (Relies on patch mentioned in my point 1 for ARG1 to work correctly when
>> not supplied)
>>
>> The table itself:
>>
>> CREATE TABLE `calltrace` (
>>   `uniqueid` varchar(64) NOT NULL,
>>   `level` enum('user','admin','system') NOT NULL DEFAULT 'user',
>>   `tracetime` datetime(3) NOT NULL DEFAULT current_timestamp(3),
>>   `message` varchar(1024) NOT NULL,
>>   KEY `uniqueid` (`uniqueid`),
>>   KEY `tracetime` (`tracetime`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1
>>
>> So we'll do stuff like Set(ODBC_trace(user)=Received call from
>> ${CALLERID(all)} to ${EXTEN}) in the dialplan for critical decision
>> points.  The other option is to log this to a file and to post-load it
>> into the database.  Really doesn't matter that much, and if it goes
>> lost, it's not the end of the world, but the above just makes it simpler
>> and more real time, allows us to use concurrent inserts.
>>
>> For any given uniqueid it's seldom that there are more than 30 entries
>> generated, around 20 or so on average, we currently dealt with this by
>> reducing the amount of logging, so whilst not all of it is strictly
>> required, it's useful stuff to be able to enable support personnel to
>> not have to bug the system administrators, and even to users to not need
>> to log stuff to support to begin with.
>>
>> The other func odbc queries are almost all select queries, and are very
>> well cached, in critical cases into astdb too.  Those never cause
>> issues, and even when the INSERTs are problematic continue to operate
>> correctly.  The above basically just starts slowing down because of the
>> implicit COMMIT, with the above as a transaction it works very well
>> overall, the transaction connection just gets held up for the duration
>> of the call, but it does lower the commit rate from 50-100 (and higher
>> in extreme cases) to 5-10 average.  We can start seeing the effects of
>> the forced disk IO around 75 odd COMMIT/s, and annoying slowdown beyond
>> that.
>>
>> One do need to be aware that READ queries are issued on handles
>> different than the WRITE queries, since the the _read function doesn't
>> take res_odbc_transaction into consideration (which arguably it should
>> but that has zero effect for my use case, in fact, as you point out it
>> may actually be bad for my use case).
>>
>>> Without diving deep into transactions, it sounds like your dialplan is
>>> doing lots of small modifications during the call. Databases like
>>> doing things in bulk, so perhaps collecting changes in channel
>>> variables before submitting them with a hangup handler will be more
>>> effective?
>> That's possible, or just group them into a transaction, it's not going
>> to reduce the overall number of queries much, but yes, since this is
>> MariaDB one could use the "bulk insert syntax" if we really wanted to
>> (but that's not standard SQL and may well fail if for some reason
>> someone some years from now decide we need to swap out MariaDB for
>> postgres), but that also complicates the overall process significantly. 
>> Our dialplan processing is usually well under a second in total, with
>> average ring time on the Dial() around 12 seconds *usually* in ANSWER
>> cases, and significantly shorter in NO ANSWER case.  This means that at
>> a call setup rate of 10 calls/second we need a concurrency of 150 to
>> have a slight safety factor, but if the COMMIT only happens when the
>> channel gets destroyed (and average call duration of 73 seconds, or
>> overall 85 seconds from channel creation to channel destruction, that
>> goes up to 850 with little to no safety, 900-1000 with, and that's in
>> both cases cutting it finer than we like).
>>
>> Don't think the discussion here is how to make the database
>> better/faster, but rather focus on how asterisk can make more convenient
>> mechanisms available in order to enable more effective and efficient use
>> of the database engine, and by implication the hardware.
>>
>> You raise a valid point though:  what if the commit could take up
>> significant (>100ms) amount of time?  Should we then have an option
>> in/for res_odbc_transaction to fork a separate thread for COMMIT before
>> returning the ODBC handle to the pool?  That actually makes some sense,
>> so of the various 3.X options that involve res_odbc_transaction if there
>> is then some way to "parallel commit", which would be fine for my use
>> case, but may be terrible for others, eg, a sequence like:
>>
>> START TRANSACTION;
>> INSERT INTO foo(a,b,c,d) VALUES(1,2,3,4);
>> COMMIT; -- implicitly releases handle back to pool.
>>
>> SELECT * FROM foo WHERE a=1;
>>
>> If the COMMIT is parallel this opens a race towards the SELECT (which
>> may end up using a different handle, and with default and sane isolation
>> levels would result in a race condition even if both "transactions" are
>> handled from the same dialplan "thread").  See above about func_odbc
>> using different handles for transactional queries and read queries anyway.
>
> There's no way to really do bulk inserts without resorting to tricks
> like "insert (a, b, c) values ($ARG1), ($ARG2), ($ARG3), ..." which in
> practice is not very useful. I don't think func_odbc is a good fit for
> what you are trying to do. As you've experienced, there is a limit to
> how many individual insert queries one can make before a database
> gives up (technically, the drives are the bottleneck here).
> Transactions allow bundling many inserts together but due to the
> length of most calls, keeping transactions open for the entire call
> duration is not a good idea because of issues I mentioned previously.
>
Exactly why I'm proposing COMMIT AT ANSWER instead of COMMIT AT HANGUP.
>
> Most dialplan applications work synchronously. They have to, func_odbc
> is no exception to that. Adding asynchronous capabilities to func_odbc
> is interesting, but will add lots of complexity. Asynchronous reads
> simply won't work because the dialplan needs those values /now/, not
> later. Asynchronous writes, maybe. But who are you going to notify
> when writes fail? The originating channel may already be gone by then.
> And what if somewhere later on the dialplan tries to read the values
> that were written asynchronously? Now you need some kind of lock or
> barrier to make sure old values are not being read. Things go down
> hill really fast.
>
I agree re the complexity, looking at concepts from cdr.c I believe this
can be somewhat simplified, and I don't think we need the thread pool
complexity, as far as I know spawning a thread is a fairly cheap
operation, so I was thinking just spawn a thread per commit and be done
with it.

No, this isn't intended as a "solve all problems with func_odbc", just
the keeping a transaction open for the duration of a call when we can
commit at answer time.

Regarding notifications, one could call back into the dialplan on async,
but for now I'd simply document that if it fails it will be silent to
the call, so use with care.  Having said that, I don't think I've ever
seen "COMMIT" fail except on out of storage ... updates and inserts
already generates row locks so usually the actual INSERT or UPDATE
statement would fail (sometimes with something vague like deadlock ...
and this can take many seconds, but the same applies even now).

Async reads may actually be possible (and interesting actually), in many
cases I may know that I will need data *later*, so I can initiate the
lookup and later retrieve data via a handle (similar to the multirow
lookup, in fact, if one were to use a cursor there instead of loading
the entire resultset you effectively already have exactly that, except
that you still need to wait for the first row of data, which means the
server has already done most of the slow work anyway).  So in theory you
could have func_odbc spawn the query in a separate thread and use a
cursor to retrieve the data once it starts coming.

Don't personally care about the read case currently and will not be
implementing that.

> Instead of making Asterisk perform all the heavy-lifting. What about
> offloading to a dedicated logging process/server? You could issue an
> AMI UserEvent() whenever you want to log something and have something
> else pick up these events and write them to the database. AMI events
> are always asynchronous, the dialplan will never stall out. The
> logging process/server can take as long as it needs to write
> everything to the database without affecting call performance.
> Implementing bulk writes would be trivial (ie. write every 1000 events
> or every second, whichever comes first, as a single transaction).
>
Interesting and very workable concept.  I like this.  Will definitely
take a look at this too.  We do have alternative use cases where we'd
then need to juggle a hundred or so AMI connections in another daemon,
along with matching each of those connections to it's own database, so
that brings other complications (or running 100 of these extra logger
daemons).

Still think a "COMMIT AT ANSWER" time patch may be a good idea,
especially in conjunction with an async commit.

Kind Regards,
Jaco

>
> -- 
> Dennis Buteyn
> Xorcom Ltd
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20210222/5b79adca/attachment.html>