[asterisk-dev] Methodologies for validating dialplan

Tue Jan 4 16:13:13 CST 2022

Yeah, sorry, it's just that the experience is fresh in my mind, and I 
wanted to help anyone avoid the pain I've been through. Whatever 
anyone's preference is, I don't think difference in performance can be 
disputed. It's not really C vs Python, it's dialplan interpreter vs 
Python. Asterisk generates costly and unavoidable newexten event for 
every executed line in dialplan. If there is a lot of them, which 
obviously there will be in a large dialplan, performance suffers greatly.

On 04. 01. 2022. 22:15, asterisk at phreaknet.org wrote:
> Thanks for the advice - however, I personally like the dialplan and 
> don't intend to stop using it. The dialplan/AGI/AEL/Lua (and now ARI) 
> "config war" goes back a long time now, and I don't think it'll ever 
> get resolved. That said, the vast majority of people *do* use 
> extensions.conf dialplan, and I like it fine as a general approach. 
> That's just my opinion, though.
>
> There *are* obvious limitations to the dialplan, which is what this 
> helps to address - not to make the workflow perfect, but better. I'm 
> not sure using AGI would really get around the underlying problem 
> here... and C performs a lot better than Python does.
>
> On 1/4/2022 4:01 PM, Nikša Baldun wrote:
>> Hi,
>>
>> I apologize for not commenting on the actual issue. However, after 
>> having the experience of writing a complex dialplan, I feel strongly 
>> compelled to say that it shouldn't be done at all. Any non-trivial 
>> call flow should be written in Fast AGI. I can't see any upside of 
>> using extensions.conf or AEL. Using a real programming language is 
>> considerably easier, faster and more powerful, all the necessary 
>> tools already exist and most importantly, execution is significantly 
>> faster. In my case, after rewriting my dialplan in Python, call 
>> preparation time fell from 2.5 seconds to a mere 50 milliseconds.
>>
>> On 04. 01. 2022. 20:53, asterisk at phreaknet.org wrote:
>>> Hi, folks,
>>>
>>>     Hope everyone's year is off to a good start. It was suggested on 
>>> one of my code reviews to post here for discussion so here this is:
>>>
>>> The PBX core, when it parses the dialplan on reload, catches a small 
>>> number of syntax errors, such as forgetting a trailing ) or priority 
>>> number, things like that.
>>>
>>> However, there are a lot of dialplan problems that represent 
>>> potentially valid syntax that will cause an error at runtime, such 
>>> as branching to somewhere that doesn't exist. The dialplan will 
>>> reload with no errors, since there isn't a syntax issue, but at 
>>> runtime, the call will fail (and most likely crash). I found over 
>>> the years that a lot of these were often simple typos or issues that 
>>> were easily fixed but wasted a lot of time in finding solely in the 
>>> "test, test, test" approach. Another common grievance I hear time to 
>>> time about the dialplan is most issues are caught at runtime, not 
>>> "compile time" (i.e. dialplan reload).
>>>
>>> One thing I've done to catch typos and syntax errors is run some 
>>> scripts that try to validate my dialplan for me by using a number of 
>>> regex-based scripts which scan the dialplan. Among other things, 
>>> this finds branches to places that don't exist, unused/dead code in 
>>> the dialplan that isn't referenced anywhere, attempts to play audio 
>>> files that don't exist, etc. In doing so, we can catch an even 
>>> greater percentage of these kinds of issues in advance, rather than 
>>> sitting around and waiting for a fallthrough at runtime, then 
>>> remedying the issue after it's already caused an issue.
>>>
>>> It works *okay* - this has helped A LOT in finding these problems 
>>> before they are encountered at runtime, and finding problems I 
>>> didn't even know existed - but it is *very* slow and probably takes 
>>> 30 seconds to run on my dialplan (which is a few 10,000s of lines).
>>>
>>> To try to improve on this, I wrote a patch that adds the CLI 
>>> commands 'dialplan analyze fallthrough' and 'dialplan analyze 
>>> audio'. It scans the dialplan using Asterisk APIs and finds 
>>> Goto/GotoIf/Gosub/GosubIf application calls that try to access a 
>>> nonexistent location in the dialplan, and 
>>> Playback/ControlPlayback/Read calls that try to play a file that 
>>> doesn't exist. Instead of taking half a minute, it's essentially 
>>> instantaneous. You can take a look at the patch/apply it from here: 
>>> https://gerrit.asterisk.org/c/asterisk/+/17719
>>>
>>> There are obvious limitations to doing this; if variables are used 
>>> in these calls, then it's very difficult - maybe impossible - to 
>>> determine if something will fail just be crawling the config, so at 
>>> the moment I ignore calls that contain variables in the relevant 
>>> area. As such, there will be false negatives, but the goal is to not 
>>> have false positives, and hopefully expose maybe the majority of 
>>> issues that could be caught in advance in this manner.
>>>
>>> Right now, the patch adds some commands to the PBX core, which Josh 
>>> suggested might not be the best way to do this additional level of 
>>> verifying the dialplan and trying to preemptively find issues with 
>>> it. For one, it relies on knowing the usage of different 
>>> applications, not all of which are PBX builtins. It might be safe to 
>>> say that the way to parse "Goto" or "Playback" in this case will not 
>>> change. A suggestion was to expose a way for modules to define how 
>>> they could be verified.
>>>
>>> I don't have any specific thoughts at the moment about how to 
>>> proceed, but interested if anyone has any thoughts on what kind of 
>>> architecture or approach here might make sense. Something to 
>>> consider is that these validations may touch multiple different 
>>> modules, maybe multiple times for the same module - and somehow this 
>>> needs to be exposed to the PBX core for processing. For instance, 
>>> the fallthrough check looks at Goto and Gosub, which are in 
>>> completely different modules. Additionally, this is focused on the 
>>> dialplan, meaning that running the rules in the module itself 
>>> probably doesn't make any sense (but defining them there somehow 
>>> might). However, ultimately there is an opportunity to preemptively 
>>> find a lot of these issues in advance and improve the user 
>>> experience, reduce frustration, etc.
>>>
>>> Thanks!
>>>
>>>
>>