[asterisk-dev] Better pattern matching
Kurt Lidl
kurt.lidl at cello.com
Thu Aug 2 15:10:08 CDT 2007
Steve Murphy wrote:
> On Thu, 2007-08-02 at 13:09 -0400, Kurt Lidl wrote:
>> Vasil Kolev wrote:
>>> В ср, 2007-08-01 в 16:08 -0600, Steve Murphy написа:
>>>
>>>> We might be able to extend the current algorithm to handle a few more
>>>> simple cases without resorting to such huge machinery. We shall see.
>>>>
>>> Going through this thread, I'm thinking - regular expressions are parsed
>>> to finite state machines, and it shouldn't be hard (for some values of
>>> hard, I have to brush on my discrete maths memories) to build one
>>> automaton to handle the full dialplan for a context and to be able on
>>> each digit to cut the possible places to go. There's some stuff that
>>> can't be done this way for sure (as we all know, all kind of weird shit
>>> is possible with regex), but at least the current syntax can get a bit
>>> extended, and probably a lot of IVRs simplified.
>> The data structure that you're grasping for is the 'trie':
>> http://en.wikipedia.org/wiki/Trie
>>
>> At a previous job, one of my developers wrote a regex -> trie generator,
>> so we could do high-speed lookups of prefix and suffix matches on
>> usernames coming into a radius proxy. Once we finally got it debugged,
>> it was amazingly fast. For that particular appplication we had several
>> hundred prefix and suffix patterns, and generated one giant tree to
>> handle them all.
>>
>> This was pretty slow to generate. So we did the work in a helper
>> program, and wrote the binary trie into a disk file. The main app
>> would stat() the file periodically, and if it was newer than
>> the last one read in, just suck the entire thing into memory, and
>> change the pointer to the root of the trie. I guess we called
>> free on the old one. Seamless, fast and worked extremely well...
>>
>> Generating one trie per context would probably work really well.
>>
>> -Kurt
>>
>
> Kurt--
>
> Yes, what I did in fast-ast is, according to definition, a trie.
> And, yes, one per context. And, yes, it works really well.
>
> No, it's not a binary tree. But that's not a requirement.
I wasn't trying to suggest that this was necessary or required. I
was just pointing out for our application of yesteryear, running on
the machines we had at the time, using the input regular expressions
that we needed to support we had to go the route of doing "off-line"
trie calculation.
As I recall it took on the order of 5 to 10 minutes to generate
the trie for our application. Which was an unacceptable delay
whenever the prefix/suffix table changed (about once a week)
or when the machine(s) running the application were restarted
or the proxy restarted.
I was putting that out there more so that people knew there was an
easy way to avoid to do extremely compute intensive updates without
compromising the basic operation of the application. Some people just
throw up their hands and say "it's too hard to calculate this at
runtime", whereas you really only need to do the recalculations when
the contents of a context change. (I have no idea how this intersects
with the 'realtime' method of doing things.)
-Kurt
More information about the asterisk-dev
mailing list