[asterisk-dev] Clustering the Asterisk Database (for fun and profit)

Sun Mar 15 22:42:05 CDT 2015

On Sun, Mar 15, 2015 at 3:21 AM, Nir Simionovich
<nir.simionovich at gmail.com> wrote:
> Hey Matt,
>
>   One of the subjects I normally tackle first, when building a federated
> system is the data store.
> Normally, I would use something like Redis and utilize its PUB/SUB bus to
> propagate information
> across nodes.
>
>   Having a similar mechanism with Asterisk seems like a good idea, would
> surely reduce much
> pain with large scale systems. Having said that, here are a couple of things
> that I would love to
> see in such a mechanism:

Using redis would be interesting, although we don't have a native
backend for it right now. David Lee has a patch up (that I *still*
need to write tests for, I'm way behind) that adds support for
RabbitMQ. The way I've implemented the distribution of the AstDB
families, it could write out to RabbitMQ in case you don't want
Asterisk to directly share with other Asterisk instances.

Using PJSIP is nice just because it is available (is anyone not using SIP? :-)

> 1. Data Sharding - While propagating information across the bus to the other
> Asterisk servers is
>     one thing, it will also be useful to make the information shardable. For
> example, information like
>     settings I'd love to share across nodes, however, information like SIP
> registrations I'd love to shard.
>     Why shard SIP registrations? simple, I'd love to be able to track the
> bouncing of registrations
>     across the network, and maybe in the future, enable a Kamailio server to
> bounce your sessions
>     from one server to the other, without the need to fully re-register or
> something like that (just a thought).

You could do this with that, at least in a fashion. For example, if
you share the 'registrar' family in a 'unique' fashion, you'll have on
each server something that looks like this:

/registrar/ {your local registrations}
/{server eid 1}/registrar/ {server eid 1's registrations}
/{server eid 2}/registrar/ {server eid 2's registrations}

That is, each server can look at the registrations on other servers,
presumably to know where those registrations have occurred. Since the
messages are being sent in SIP, they presumably could be sent to
Kamailio as well.

> 2. Presets - Thinking from an integrator/developer point of view, I think it
> would be wise to add a
>     astdb_cluster.conf file. The idea is to declare specific settings and
> astdb families and automaticaly
>     shared. For example, imagine the following file:
>
> [general]
> cluster_name = cluster.voip    ; must be identical across all nodes
> cluster_node = 1
> cluster_secret = some.super.secret.key
>
> [family1]
> sharing_policy = multicast
> sharing_nodes = 2,3,4,5
> sync_interval = 500 ; in ms
> sync_mode = ro
>
> [family2]
> sharing_policy = broadcast
> sync_interval = 5000 ; in ms
> sync_mode = rw
>
>   The idea is the following: some information requires rapid sync, for
> example, state information within a
> queue. Other information may require slow sync - replication of
> registrations between nodes, or even propegation
> of extension settings between nodes. The idea is "define it and forget about
> it" - the minute the sharing is
> defined, the astdb clustering tool will take care of the rest. The
> "sync_mode" will set if the node is allowed to
> update values in the astdb family. For example, some nodes may serve as
> hot-backups, others may serve as
> active-active clusters.

The 'sharing' of the information to other Asterisk instances is
actually all handled by the mechanism doing the sharing. So, for
example, if PJSIP is used, then you configure outbound and inbound
PUBLISH handlers. If XMPP were to be used (although I haven't written
that), then you'd use res_xmpp. The synchronization interval isn't
really controllable either, as we simply reflect any writes that have
occurred to a piece of data. If that ends up being limiting, than some
sort of 'dirty' flag and mechanism could be added - but I'd probably
hold off on that until knowing that it is needed.

About the only thing you really do need to persist is what families
are shared. Currently, I'm just storing that in its own family in the
AstDB, and restoring shared families on startup.

I considered using a .conf file, but a few things gave me pause:
(1) There just isn't much to store in there now.
(2) We don't have a db.conf currently, and I thought it might be a bit
much to add another .conf file to the core.
(3) Most of the DB manipulation today happens through functions, which
are accessible through the APIs. Since adding functions for the
sharing of families would achieve the same thing and match the
existing manipulation mechanisms, that felt appropriate.

Do you still think a .conf file is necessary? Or  is the existing way
of doing it potentially sufficient?

>   One thing that I see a potential issue with is astdb locking. Mainly, how
> do we maintain astdb persistence
> across all nodes. AstDB is one of those things that if it fails or becomes
> corrupted, your entire Asterisk becomes
> fairly unusable (ala FreePBX style).

Locking already occurs in the AstDB, and this didn't really change it.
The last issue with corruption I've seen occurred on abrupt shutdowns
and was fixed recently, so I don't think this would impact that.

As in all software, there is no free lunch, and knowing that a family
is shared and publishing an update does cause an increase in overhead.
In one of the unit tests, 100k AstDB updates are made. Without
sharing, the additional code caused a 10% increase in the run time of
that test due to additional checks that have to be made ("is this
family that I just updated shared? No - okay, no need to information
anyone"). With sharing of a family, it is more - although most of that
is done on threads other than the one updating the AstDB.

So, a small hit in performance with the feature, a bit more if you use
it. Clearly, sharing every family would probably be a "bad idea" -
although that wouldn't cause corruption.

>   Another issue that needs tackling is split-brain. What happens when a
> cluster becomes split and then converges
> back. What information is considered consistent and true? how do we go about
> and make sure that whatever is
> converged is the right thing?

This is determined based on how things are shared.

In a global shared family, the last person who writes to the key "wins."

In a unique shared family, each server cannot be overwritten by
others. If you need to aggregate the value, then that has to be done
by either something in the dialplan or by an external application.

> Just my 2c on the subject at hand.

Thanks for the feedback!

> Nir S
>

-- 
Matthew Jordan
Digium, Inc. | Director of Technology
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org