[asterisk-dev] scalability issue with realtime lastms update on qualify state change

Mon Nov 18 07:34:48 CST 2013

On Mon, Nov 18, 2013 at 12:58 AM, Tilghman Lesher <tilghman at meg.abyt.es>wrote:

> On Sun, Nov 17, 2013 at 3:31 PM, Damon Estep <damon at soho-systems.com>
> wrote:
> >> >
> >> >> If you really are averse to the effect that saving lastms in the
> >> >> database causes, why don't you just remove that column from your
> >> >> realtime tables?  We've had this ability to remove columns that you
> >> >> prefer not to save in dynamic realtime since 1.6.2, and it sounds
> >> >> like this is, in effect, precisely what you'd prefer.
> >> >>
> >> >> -Tilghman
> >> >>
> >> >
> >> > Can you help me understand what removing the column would do? I did
> not
> >> see anything in the code that would stop the database update attempt if
> the
> >> column did not exists. Would it not try, or would it just fail
> gracefully?
> >>
> >> The meta-code would simply drop that column from the UPDATE, if it
> doesn't
> >> exist in the target table.
> >
> > Are we talking about something implemented in realtime_odbc, or also in
> realtime_mysql?
> > And if it is the only column in the update, as in the case of a peer
> state change update?
> > The generated update query on peer state change is 'update sipusers set
> lastms = [value] where name = [peername]'
>
> It's implemented in both the ODBC and the MySQL realtime drivers.  In
> both cases, however, the first key/value pair should always exist and
> if it doesn't, both will fail:  the MySQL driver will emit an ERROR,
> and the ODBC driver will attempt to execute what will be invalid SQL.
> This is clearly a bug in the ODBC driver.  This situation probably
> could be fixed in the realtime drivers to simply short-circuit and
> cancel out the operation when no columns can be updated.
>
> > BTW, thanks for taking the time to discuss this.
>
> I still think the better option would be to control the rate at which
> probes for peers are sent out, so their responses can be received and
> processed on time.  Either of these present solutions are merely
> attempting to code to the symptom, rather than fixing this underlying
> problem.  However, moving the probes from the individual peer threads
> to a single background thread is not an easy or simple change, which
> is probably why you haven't attempted it.
>

I agree with Tilghman. The correct way of handling this would be to either
offload the database queries to a separate thread, or multi-thread the
entire system such that a blocking call to the database does not impact
other request handling. This is exceptionally non-trivial to accomplish in
chan_sip, as it has no concept of asynchronous callbacks and resuming
operations.

This is a large part of why we went with a new architecture in the PJSIP
stack, which doesn't suffer from this particular limitation (a thread pool
is used for request/response handling, so responses that take significant
time to process do not impact the handling of other responses). I
understand that waiting for Asterisk 12/13 may not be an option and may
require you to attempt to solve this in chan_sip; however, I have a feeling
that such an effort would be an exceptionally difficult project.

Matt

-- 
Matthew Jordan
Digium, Inc. | Engineering Manager
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20131118/c5c513d5/attachment.html>