[asterisk-dev] Methodologies for validating dialplan

Fri Jan 7 07:10:08 CST 2022

Hi, Jaco,

    Thanks for considering this.

To answer your question, currently most validation ignores things with 
variables (basically if strchr(data, '$'), we bail out with 0). This is 
because processing and keeping track of variables would most likely be 
quite difficult to do accurately and correctly and in a way that is 
robust enough to use (and probably require some kind of simple call 
simulation). So at the moment, those aren't processed; only static 
Goto/Gosub/etc. are, which means there will be some false negatives. 
However, each individual app validation callback can do whatever it 
likes. If somebody wanted his module to try to parse variables for some 
reason, nothing would stop him from doing so.

I rehauled the architecture of this yesterday so that modules can now 
add their own validation callbacks when registering applications: 
https://gerrit.asterisk.org/c/asterisk/+/17719

Each module can internally then validate dialplan calls as it sees fit. 
It is the individual callbacks that log WARNINGs or what not; currently, 
they are indeed WARNINGs, though maybe a case could be made that some of 
them should really be ERRORs since we know for a fact that the dialplan 
code in question will cause a problem.

So hopefully the architecture now is flexible enough to allow different 
modules to do what they want or need to do and accommodate the different 
directions this might go in the future. At this point, I think the 
review is more about the mechanism which facilitates modules to do 
validation rather than any specific validation, since that could change 
or be added at any point without requiring changes to the core.

Music on hold is on my list of things to add; I didn't add it to this 
review since the above seemed like enough for this concept to flesh out. 
In a future review, though, I do plan to add class existence validation 
to res_musiconhold. There are a lot of other apps that could likewise 
use some validation, but similarly I'll hold off on those for the moment 
and just stick to the ones in the review at the moment.

On 1/7/2022 7:52 AM, Jaco Kroon wrote:
> Hi,
>
> I find this discussion very interesting.  I do have one question, and
> this was implied by another poster:
>
> How do you intend to deal with dynamic stuff, for example:
>
> exten => ??,n,Goto(foo-${bar})
>
> In this case, what's the possible values for ${bar}?
>
> Would it make sense to enhance the dialplan execution system to
> basically collect error points?  For example, in the above, an entry to
> the "failure" list could be added providing the context, ${EXTEN} and
> values of expanded variables.  And number of times this has "hit", so
> assuming the above is in context foobar, and ?? is something like _0Z.
> and the actual value of exten was 0111, and bar=fneh, then the entry
> could store:
>
> {
>     execution_point: foobar,_0Z.,n
>     EXTEN=0111
>     bar=fneh
> }
>
> This could be hashed in some form such that a hit counter for the tuple
> can be created. Then something like "diaplan show failure points".  Not
> sure it makes sense to log all variables, but it could, and possibly the
> dialplan execution leading up to the failure, so perhaps just the call
> id (C-??????) so that it can be looked up in the verbose logs (which we
> do log as part of full, even if we only keep it for 72 hours).
>
> Special handing for Hangup() and others that are expected to hangup the
> channel could be made (I don't recall if there are different return
> codes from functions/applications for indicating failure vs continue vs
> hangup etc ...
>
> This is a bulk catch-all, and can possibly get stats about dynamic stuff
> failing too.  A static analyzer would still be great to catch obvious
> failure cases such as Goto(invalid) at load time.
>
> An API to add warnings/non-fatal errors from applications/functions
> would be great too (eg, MOH not found in some cases won't cause a
> failure, just silence instead of moh, but if there is an API then in
> such a case an entry could still be logged, ideally then with an extra
> variable such as message="MOH ${moh_class} wasn't found").
>
> Kind Regards,
> Jaco
>
> On 2022/01/06 16:24, asterisk at phreaknet.org wrote:
>
>> On 1/5/2022 6:06 PM, Mark Murawski wrote:
>>> Hi!
>>>
>>> Throwing my .02 in here.  Adjust to .10 for inflation!
>>>
>>>
>>> On 1/4/22 14:53, asterisk at phreaknet.org wrote:
>>>> .... However, there are a lot of dialplan problems that represent
>>>> potentially valid syntax that will cause an error at runtime, such
>>>> as branching to somewhere that doesn't exist. The dialplan will
>>>> reload with no errors, since there isn't a syntax issue, but at
>>>> runtime, the call will fail (and most likely crash). I found over
>>>> the years that a lot of these were often simple typos or issues that
>>>> were easily fixed but wasted a lot of time in finding solely in the
>>>> "test, test, test" approach. Another common grievance I hear time to
>>>> time about the dialplan is most issues are caught at runtime, not
>>>> "compile time" (i.e. dialplan reload).
>>> I love it!
>>>
>>> And... take an example from a basic cli call that has a validator
>>> built-in:
>>>
>>> static char *meetme_kick_cmd(struct ast_cli_entry *e, int cmd, struct
>>> ast_cli_args *a)
>>> {
>>>          switch (cmd) {
>>>          case CLI_INIT:
>>>                  e->command = "meetme kick";
>>>                  e->usage =
>>>                          "Usage: meetme kick <confno> all|<userno>\n"
>>>                          "       Kick a conference or a user in a
>>> conference.\n";
>>>                  return NULL;
>>>          case CLI_GENERATE:
>>>                  return complete_meetmecmd_mute_kick(a->line, a->word,
>>> a->pos, a->n);
>>>          }
>>>
>>>          if (a->argc != 4) {
>>>                  return CLI_SHOWUSAGE;
>>>          }
>>>
>>>          return meetme_cmd_helper(a);
>>> }
>>>
>>>
>>> This design pattern has self-contained validation.  And it's up to
>>> the function to validate as much or as little as it wants to.
>>>
>>> We would need a new ast_register_application_xml2 and
>>> ast_custom_function_register2 to have a new design pattern for
>>> self-validating functions and applications.
>>>
>>> Each application would know what kind of parameters it takes, and
>>> what's valid and not valid
>>>
>>> new-stle for say... Goto Application
>>>
>>> static int gosub_exec(int cmd, struct ast_channel *chan, const char
>>> *data)
>>> {
>>>      if (cmd == APP_PARAMS_VALIDATE)
>>>          ... do some validation!
>>>
>>> ideally gosub knows it's being passed an exten/context/priority and
>>> can call ast_exists_extension at dialplan parse time via the
>>> validator flow
>> I think your line of thinking here makes a lot of sense. However, I
>> don't think it makes sense to do validation in the actual execution
>> function like CLI handlers, as you've shown.
>>
>> I think an enhanced application (and function) register function would
>> make sense, accepting another function to do the validation, e.g.:
>>
>> ast_register_application_xml_validate(app, app_exec, app_validate);
>>
>> Some of these validation functions can be quite involved, and
>> separating them out that way makes more sense.
>>
>> I slept on this overnight, and here are some other thoughts:
>>
>> The current approach on the review is to have an external module
>> (currently res_pbx_analyze) use core PBX APIs to crawl the dialplan
>> and execute callbacks to find fallthroughs and missing audio files.
>> Obviously, there are a lot of other things we could look for as well.
>> Currently, my approach is looking for "types" of problems, rather than
>> analyzing specific applications, but now that I think about it, the
>> latter is really a subset of the former. Each application really has
>> *one* type of validation that it's doing... Goto/Gosub etc. will check
>> the location... MusicOnHold will check the MOH class...
>> Playback/Background/Read will check the audio file... Voicemail would
>> check the voicemail box, and so on... Dial would validate that a
>> registered channel tech is being used, etc. (validating that any
>> options use also exist could also be another component).
>>
>> Some of these naturally fall into groups, but a good way to deal with
>> this might be CLI aliases, e.g.:
>>
>> dialplan analyze fallthrough => dialplan analyze Goto, dialplan
>> analyze GotoIf, dialplan analyze Gosub, dialplan analyze GosubIf
>>
>> dialplan analyze audio => dialplan analyze Playback, dialplan analyze
>> ControlPlayback, dialplan analyze BackGround, dialplan analyze Read
>>
>> Obviously, the validation functions for some of these are going to be
>> almost identical. To deal with code duplication, each validation
>> function could call the same function within the core, even though the
>> entry point would be different.
>>
>> Of course, for registering applications in the same module (like Goto
>> and GotoIf), the same validation function could directly be use for
>> both. We'd want to use the same function though from within app_stack
>> for Gosub/GosubIf, most likely, so maybe for rare cases of multiple
>> modules sharing validation functions, those could go in a core file
>> like pbx_validate that contains common-to-many-modules validation
>> functions. The vast majority of modules that do this, though, I
>> expect, would have their own validation function within the module.
>>
>> (By the way, I'm not set on dialplan analyze... still debating whether
>> that or dialplan inspect or dialplan validate would make the most
>> sense...)
>>
>> Another reason I think doing validation in module is a better
>> approach. For one, not all the internals of modules are exposed
>> outside of it. For instance, say we want to verify that MusicOnHold()
>> and StartMusicOnHold(), if they contain an argument, contain the name
>> of a valid in-memory MOH class. This is only exposed within
>> res_tonedetect currently. To do validation in another module would
>> mean that needs to be exposed somehow. Obviously, this can get very
>> messy.
>>
>> If it's done in-module, then it just works out cleanly. So, we could
>> do away with a dedicated external module and simply register
>> validation functions from within particular modules.
>>
>>> This would be a major undertaking to add validation for every
>>> function and every application but since dialplan is so ubiquitous
>>> and is probably never going away... would make huge inroads to making
>>> sure everything is as correct as possible at parse/load time.
>> I don't really see this as something that would be done for every
>> single module necessarily, but some that come to mind that could benefit:
>>
>> Goto, GotoIf, Gosub, GosubIf, ExecIf, Verbose, Log, Dial, VoiceMail,
>> ConfBridge, Playback, Read, BackGround, etc.
>>
>>> There needs to be an externally-facing way to call
>>> function/application validators for those of us who write dynamic
>>> dialplan generators.  Maybe an AMI command?
>> Considering it would just be executing the registered validation
>> function for each app, function, etc. should be doable.
>>
>> At the moment, I'd envisioned it mainly be CLI commands, first and
>> foremost, since that goes along nicely with dialplan reload. And, as I
>> said above, there could be CLI aliases for each "group" of validation
>> functions that naturally go well together, to find specific categories
>> of errors. There could also be a "dialplan analyze all" to run all
>> registered validation functions and report any problems.
>>
>>> Strict: false    <-- allow for generic 'does it look okay' testing vs
>>> does the destination actually exist
>> I'm with you until here... this I'm not sure I understand the point as
>> much of the "generic" validation. If we're not validating the
>> destination actually exists, in this case, then we're not really doing
>> much at all. At that point, it's basically counting commas and
>> reporting whether there's no more than 2.
>>> And then more of a full validator via a new application and AMI Command
>>> ValidateDialplanFile(/tmp/new_extensions.conf,/tmp/path_to_errors.log);
>> As far as specific error logging, currently I'm just throwing WARNINGs
>> that would get sent out to all the relevant consoles and log files. A
>> simple grep might be able to filter all of those out.
>>
>>