[asterisk-dev] Methodologies for validating dialplan

Tue Jan 4 15:15:38 CST 2022

Thanks for the advice - however, I personally like the dialplan and 
don't intend to stop using it. The dialplan/AGI/AEL/Lua (and now ARI) 
"config war" goes back a long time now, and I don't think it'll ever get 
resolved. That said, the vast majority of people *do* use 
extensions.conf dialplan, and I like it fine as a general approach. 
That's just my opinion, though.

There *are* obvious limitations to the dialplan, which is what this 
helps to address - not to make the workflow perfect, but better. I'm not 
sure using AGI would really get around the underlying problem here... 
and C performs a lot better than Python does.

On 1/4/2022 4:01 PM, Nikša Baldun wrote:
> Hi,
>
> I apologize for not commenting on the actual issue. However, after 
> having the experience of writing a complex dialplan, I feel strongly 
> compelled to say that it shouldn't be done at all. Any non-trivial 
> call flow should be written in Fast AGI. I can't see any upside of 
> using extensions.conf or AEL. Using a real programming language is 
> considerably easier, faster and more powerful, all the necessary tools 
> already exist and most importantly, execution is significantly faster. 
> In my case, after rewriting my dialplan in Python, call preparation 
> time fell from 2.5 seconds to a mere 50 milliseconds.
>
> On 04. 01. 2022. 20:53, asterisk at phreaknet.org wrote:
>> Hi, folks,
>>
>>     Hope everyone's year is off to a good start. It was suggested on 
>> one of my code reviews to post here for discussion so here this is:
>>
>> The PBX core, when it parses the dialplan on reload, catches a small 
>> number of syntax errors, such as forgetting a trailing ) or priority 
>> number, things like that.
>>
>> However, there are a lot of dialplan problems that represent 
>> potentially valid syntax that will cause an error at runtime, such as 
>> branching to somewhere that doesn't exist. The dialplan will reload 
>> with no errors, since there isn't a syntax issue, but at runtime, the 
>> call will fail (and most likely crash). I found over the years that a 
>> lot of these were often simple typos or issues that were easily fixed 
>> but wasted a lot of time in finding solely in the "test, test, test" 
>> approach. Another common grievance I hear time to time about the 
>> dialplan is most issues are caught at runtime, not "compile time" 
>> (i.e. dialplan reload).
>>
>> One thing I've done to catch typos and syntax errors is run some 
>> scripts that try to validate my dialplan for me by using a number of 
>> regex-based scripts which scan the dialplan. Among other things, this 
>> finds branches to places that don't exist, unused/dead code in the 
>> dialplan that isn't referenced anywhere, attempts to play audio files 
>> that don't exist, etc. In doing so, we can catch an even greater 
>> percentage of these kinds of issues in advance, rather than sitting 
>> around and waiting for a fallthrough at runtime, then remedying the 
>> issue after it's already caused an issue.
>>
>> It works *okay* - this has helped A LOT in finding these problems 
>> before they are encountered at runtime, and finding problems I didn't 
>> even know existed - but it is *very* slow and probably takes 30 
>> seconds to run on my dialplan (which is a few 10,000s of lines).
>>
>> To try to improve on this, I wrote a patch that adds the CLI commands 
>> 'dialplan analyze fallthrough' and 'dialplan analyze audio'. It scans 
>> the dialplan using Asterisk APIs and finds Goto/GotoIf/Gosub/GosubIf 
>> application calls that try to access a nonexistent location in the 
>> dialplan, and Playback/ControlPlayback/Read calls that try to play a 
>> file that doesn't exist. Instead of taking half a minute, it's 
>> essentially instantaneous. You can take a look at the patch/apply it 
>> from here: https://gerrit.asterisk.org/c/asterisk/+/17719
>>
>> There are obvious limitations to doing this; if variables are used in 
>> these calls, then it's very difficult - maybe impossible - to 
>> determine if something will fail just be crawling the config, so at 
>> the moment I ignore calls that contain variables in the relevant 
>> area. As such, there will be false negatives, but the goal is to not 
>> have false positives, and hopefully expose maybe the majority of 
>> issues that could be caught in advance in this manner.
>>
>> Right now, the patch adds some commands to the PBX core, which Josh 
>> suggested might not be the best way to do this additional level of 
>> verifying the dialplan and trying to preemptively find issues with 
>> it. For one, it relies on knowing the usage of different 
>> applications, not all of which are PBX builtins. It might be safe to 
>> say that the way to parse "Goto" or "Playback" in this case will not 
>> change. A suggestion was to expose a way for modules to define how 
>> they could be verified.
>>
>> I don't have any specific thoughts at the moment about how to 
>> proceed, but interested if anyone has any thoughts on what kind of 
>> architecture or approach here might make sense. Something to consider 
>> is that these validations may touch multiple different modules, maybe 
>> multiple times for the same module - and somehow this needs to be 
>> exposed to the PBX core for processing. For instance, the fallthrough 
>> check looks at Goto and Gosub, which are in completely different 
>> modules. Additionally, this is focused on the dialplan, meaning that 
>> running the rules in the module itself probably doesn't make any 
>> sense (but defining them there somehow might). However, ultimately 
>> there is an opportunity to preemptively find a lot of these issues in 
>> advance and improve the user experience, reduce frustration, etc.
>>
>> Thanks!
>>
>>
>