[asterisk-dev] Better pattern matching
Steve Murphy
murf at parsetree.com
Wed Aug 1 17:08:25 CDT 2007
On Wed, 2007-08-01 at 14:53 -0500, John Lange wrote:
> On Wed, 2007-08-01 at 14:48 -0400, Jared Smith wrote:
> > they're worried
> > that this will severely impact the time it takes for Asterisk to do
> > pattern matching.
>
> There is no reason why Asterisk can't continue to support it's existing
> system using the "_" (underscore) to indicate the old system and some
> other character (maybe enclose the regex in "/" like many other
> languages) to indicated the next expressions is a regex.
>
> That provides complete backwards compatibility, causes zero performance
> hit for anyone who doesn't need regex, and accomplishes the goal of
> better pattern matching.
>
> > there's
> > even an RFC with a very complete pattern matching syntax defined (which
> > I can't find right now, of course).
> >
> > Hopefully, one of the Asterisk developers will stumble across this post
> > and take pity on us and help us out!
>
> Personally I don't support writing a regex engine from scratch for
> implementation in Asterisk if that is what is required. I'm under the
> assumption (perhaps wrongly) that there is a regex library available in
> a suitable (BSD?) style license.
>
> This must be true since the various BSDs certainly have regex pattern
> matching and one of the posts in the archives mentions that very option.
>
> So if that's the case it should be fairly easily to add a call to a
> library function that returns true/false if a match is found?
>
> Briefly looking at the existing code I see this is already the way it
> works with the existing pattern matching. Mind you I'm not much of a C
> programmer so I could be out to lunch.
>
> John
>
I've written a report on the effects of the current algorithm on the
speed of
processing in Asterisk. See http://www.asterisk.org/node/112
On the graph, look at the green line at the bottom. That's the affect
of
the number of extensions in a context on the processing speed of
asterisk.
So, for as few as 100 extensions in a context, you suffer a degradation
of
80-90% in speed in the dialplan. A roughly 50% slowdown occurs with only
about 20 or 30 or so extensions in a context!
The reason for the slowdown is that the algorithm has to try every
pattern, and see which is best. Throwing a regular expression parser
into the mix will slow it down even harder. And, no matter what regex
parser you use, you'll have to beef it up to provide some metric of
"closest", so you can make a decision on which pattern is the 'best
match'.
In order to speed up the algorithm, and make response more flat in
asterisk, no matter the size or composition of the dialplan, I resorted
to using hash tables
for priorities and context names, but the extension matching I handled
by moving to a special tree structure, where all the patterns are
embedded in that tree. The traversal algorithm follows all possible
paths to their leaves, and then chooses the highest (or is it the
lowest? I can't remember) score to determine the 'best match'. It's
response time is relatively flat for even 1000 extensions
in a context, which is expected, because this pattern matching algorithm
is more a function of the length of the patterns than how many patterns
there are.
To add regex's to the mix, will require nothing less than somehow
gutting the flex (or even lex) program for what it's doing-- returning
the best match from a set of regex's. The state machine, the character
classes, etc. I might be able to carve out a tree representation, but
general regex's can include variable lengths, etc, and these are better
represented with a state machine for traversal, than a tree.
Again, while it might be possible to use re's instead of the simple
syntax we have now, you can pretty much write off that you will ever
optimize it, and Asterisk will be chained forever to an n-squared loss
of productivity when large dialplans are used. That algorithm, repeated
every time a step is taken in the dialplan, will eat up tremendous
numbers of cpu cycles and make Asterisk a slow beast under large loads.
(BTW, realtime users with large dialplans are even worse off-- every
extension
in a context must be fetched from the DB to find the best match, which
gets slow quickly!).
We might be able to extend the current algorithm to handle a few more
simple cases without resorting to such huge machinery. We shall see.
murf
>
>
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3239 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20070801/c66533e8/attachment.bin
More information about the asterisk-dev
mailing list