[asterisk-dev] Pain Points For Large Scale Instance Provisioning

Wed Oct 21 03:35:29 CDT 2020

Hi,

On 2020/10/20 23:32, Michael Cargile wrote:

> Towards the end of DevCon, Matt asked if there were any pain points
> for provisioning large numbers of Asterisk instances and I mentioned I
> would talk to my colleague who handles such things. He provided this
> list:
>
> * Sanity checks within Asterisk at start up and module reload. this
> include:
>      -- Asterisk making sure it has the proper file permission for all
> directories that it is configured to write / read to

If this is implemented, this would need to be configurable.  We had a
check in our init script on Gentoo.  This was switched off.  Most
deploys was not an issue ... but I think the record we clocked was ~6
hours startup time just checking /var/spool/asterisk and sub paths. 
Yes, the script could have been improved to not check recursively and
not descend into sub folders for recordings ...

Still, very good and valid suggestion.

>      -- verification that things like audio files called from
> Background / Playback are actually there
>   If these checks fail throw an error at start up / reload rather than
> when something is attempted to be accessed so these problems can be
> addressed sooner

Some of these names are determined dynamically, especially when multiple
formats are involved.  One thing that would be nice is at least a syntax
validation, eg, missing or extraneous brackets and the like.  For
example ... Set(foo=${bar) <-- obviously invalid, missing }.  Nice to
have in my opinion though.

>
> * asterisk.conf directory variables for things like audio files are
> not always honored requiring symlinks as a work around (though this
> might be the OpenSuSE build of Asterisk causing issues)

Never encountered this.  And we make heavy use of this (eg, running
multiple, generally < 100, instances on the same physical and using
astspooldir => /var/spool/asterisk.uls).  If this was not being honoured
we'd have issues that we'd only be able to describe as insane critical.

>
> * Reliable module reloading without core restarts
>      Example: Client lets their SSL certificate lapse on an Asterisk
> server and they only figure this out when their
>      agents attempting to log in using WebRTC clients. They have
> dozens or even hundreds of customer calls in queue,
>      but their agents cannot login. On Asterisk 13 we cannot fix the
> SSL certs without a full restart of Asterisk
>      which drops these calls. A reload of the http module does not fix
> this.

Sean mentioned this fixed, looking at the diff, http module reload will
now be adequate.  And PJSIP from what I can tell don't suffer this
issue.  chan_sip loads certificates at accept() time.

Trying to confirm chan_sip I did find that the setting of sip_reloading
= FALSE happens in an odd place ... will check that out a bit later.

And then I'd like to also add:

* reduction of idle-instance CPU usage (which seems to be running @
~0.7% of a core generally when asterisk is doing "nothing" - obviously
variable based on CPU clock speed).  Not a major problem when running
one or two instances, but does create an artificial upper limit, and
there are measurable power implications when running hundreds of
instances in the same rack.

Kind Regards,
Jaco

>
>