[asterisk-dev] Parsing in Asterisk

Tue Mar 13 10:25:11 MST 2007

On Mon, 2007-03-12 at 14:33 -0600, Tilghman Lesher wrote:
> We need a revamp of parsing in Asterisk.  What we have now creates
> serious problems when writing complex extensions.conf logic.  For 
> example, I recently had to coach someone to write the following in his
> extensions.conf to make it work correctly:
> 
> exten => s,n,Set(ODBC_TEST3(foo)=123\,\\"456\,789\\"\,012)
> 
> Dissecting this, we're passing 3 arguments (as VAL1, VAL2, VAL3) to a
> dialplan function named ODBC_TEST3.  The second argument contains a
> comma, which needs 4 different escaping sequences to be passed correctly
> into the function.  First, the comma itself needs to be escaped, to
> avoid it from being translated into a '|' by the extensions.conf parser.
> Second, the entire second argument needs to be quoted so that it is seen
> as a complete entity (as opposed to two different arguments).  Third,
> the quotes need to be escaped (with a '\') to prevent them from being
> eaten by the argument parser called by Set (because Set takes multiple
> variable pairs).  And last, the backslashes themselves need to be
> escaped, to protect them from the extensions.conf parser, which itself
> sees as escape characters.
> 
> If you can't tell already, this is madness, and we need to do something
> about it.
> 
> To understand where all of these layers came from, we have to go back
> a bit into Asterisk dialplan history.  The first extensions.conf syntax
> looked a bit different than it does today (ignoring changes in the app
> names):
> 
> exten => s,3,SetVar,foo=1
> 
> Back then, we didn't have the nice syntax of the arguments passed to
> the application enclosed in nice parentheses; rather, we had a list of
> values to the exten keyword.  The application arguments just happened to
> be the fourth argument.  Today, we rely on a transparent translation
> from the new easier-to-read syntax to the old syntax.  This maintains
> reverse compatibility, although at a cost in future complexity.
> 
> This leads me to my first conclusion:  we need to remove the transparent
> translation, preferring the comma as our argument delimiter.  I am aware
> that this is a significant change and it will require a flag day,
> because all applications currently parse on the pipe character.
> However, I believe that this is necessary to allow Asterisk dialplans to
> become more complex without sacrificing readability.
> 
> The second problem is that at some point, the Set app was made to accept
> multiple variable name/value pairs.  While this may seem like a good
> idea, it really complicates things, because it means that we cannot
> easily set values with pipes.  Worse, because it uses the standard
> parser, it strips a level of escape characters, which means that in
> order to pass such an argument as a value to a dialplan function, you
> need to escape it twice.
> 
> At the end of the day, what I'd like to happen is for dialplan parsing
> to become far more sensible than it is currently.  This will allow
> dialplan developers to create more complexity, without continually
> running into the wall of needing endless amounts of escape characters,
> to get Asterisk to perform as expected.
> 
> I have gotten limited approval, at least in passing, for the change to
> Set, but I wanted to put the issue in front of all the developers and
> ask for feedback before we go and make any of these changes.  Obviously,
> I think this is important, so at a meta-level, I need to ensure that
> others are on board (or if I've missed something, I need feedback on
> that, as well).
> 

Tilghman--

I've had some definite ideas about parsing in Asterisk, and am beginning
to prototype a new way of going about it. It's mostly for the sake of
AEL, but some underlying concepts will eventually affect the
extensions.conf format to a degree.

From a high-level perspective, Asterisk is doing a lot of work at lower
levels, work which, in like 99% of the time, it just repeats over and
over. I've got a plan to shift the parsing outside the apps, and
hopefully speed things up a little.

Also, AEL ( and to a degree the stuff in extensions.conf ) has some
context sensitive grammar. I want to get rid of that, and provide a
uniform that extends all the way down to expressions. I want to combine
the $[...] and ${...} constructs into direct AEL, and thereby unifying
these 3 syntaxes into a single syntax and grammar under AEL. By doing
this, $[...] and ${...} constructs will no longer be necessary. Writing
complicated expressions and making sure all the matching ]'s and }'s are
in the right places, is pretty tricky!

It's not backwards compatible with current AEL, tho, so I'm not
expecting a huge welcome mat for what I'm planning.

Also, I plan to restrict naming conventions for variables, labels, and
etc. Fixed strings will be enclosed in double quotes, and escapes for
special chars, and unicode/utf8 will be available. Simple concatenation
operators will be available, to keep down the disruption in forming
complex strings. The exact specifics of the grammar I have not yet
written, and are still subject to negotiation!

At the top level, everything you would pass to an app or func will be
parsed into an expression tree structure, and instead of passing a
string, you pass an expression tree.

As is now being done, the tree will be evaluated, and all the variables
and expressions evaluated to constants, and concatenations performed to
single strings, before being passed to the app. Lists of args will be
represented a binary tree with separator nodes (like the comma operator
notation).

Eval/parse funcs will be available to reparse expression strings to such
trees, and evaluate them,  so you can do all the interesting things you
will want/need to do. I also plan to have list notation, so lists can be
passed as a single item.

The extensions.conf parser would call out the above parser and the
add_extension2() func would be modified to accept the tree pointer
instead of the data string it now accepts for the app arguments.
It is hoped that doing all this sort of thing will force all apps to a
single form of notation, so that stuff like URL notation will not freak
out IF functions, CUT functions, etc.

Of course, this would involve a bit of refactoring of the apps, but as
someone's already pointed out, it would hopefully be as simple as just
redoing the arg-parsing macros to simply pull the right elements from
the tree.

I know this radical. Probably nobody will want it. But, as research, it
should be valuable to see what kind of performance improvement we can
get of shifting a lot of the parsing work to the compile phase, and
should not reduce the power of the extension language to do what you
want to do. There would also be a gain from funneling all parsing thru a
single parsing engine, with a fairly good set of error checking and
reporting capabilities. And, finally, I hope to increase the speed of
the extension engine to be as good, if not better than, most interpreted
languages you would use via AGI scripts.

I hope I've explained things clearly. I hope not to cause a firestorm of
outrage either. If the community objects to these kind of future
changes, I'll drop it! I promise!

murf

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3239 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20070313/29b636e7/smime-0001.bin