[asterisk-dev] Better pattern matching

Steve Murphy murf at digium.com
Thu Aug 2 15:49:36 CDT 2007


On Thu, 2007-08-02 at 16:10 -0400, Kurt Lidl wrote:
> Steve Murphy wrote:
> > On Thu, 2007-08-02 at 13:09 -0400, Kurt Lidl wrote:
> >> Vasil Kolev wrote:
> >>> В ср, 2007-08-01 в 16:08 -0600, Steve Murphy написа:
> >>>
> >>>> We might be able to extend the current algorithm to handle a few more
> >>>> simple cases without resorting to such huge machinery. We shall see.
> >>>>
> >>> Going through this thread, I'm thinking - regular expressions are parsed
> >>> to finite state machines, and it shouldn't be hard (for some values of
> >>> hard, I have to brush on my discrete maths memories) to build one
> >>> automaton to handle the full dialplan for a context and to be able on
> >>> each digit to cut the possible places to go. There's some stuff that
> >>> can't be done this way for sure (as we all know, all kind of weird shit
> >>> is possible with regex), but at least the current syntax can get a bit
> >>> extended, and probably a lot of IVRs simplified.
> >> The data structure that you're grasping for is the 'trie':
> >> 	http://en.wikipedia.org/wiki/Trie
> >>
> >> At a previous job, one of my developers wrote a regex -> trie generator,
> >> so we could do high-speed lookups of prefix and suffix matches on
> >> usernames coming into a radius proxy.  Once we finally got it debugged,
> >> it was amazingly fast.  For that particular appplication we had several
> >> hundred prefix and suffix patterns, and generated one giant tree to
> >> handle them all.
> >>
> >> This was pretty slow to generate.  So we did the work in a helper
> >> program, and wrote the binary trie into a disk file.  The main app
> >> would stat() the file periodically, and if it was newer than
> >> the last one read in, just suck the entire thing into memory, and
> >> change the pointer to the root of the trie.  I guess we called
> >> free on the old one.  Seamless, fast and worked extremely well...
> >>
> >> Generating one trie per context would probably work really well.
> >>
> >> -Kurt
> >>
> > 
> > Kurt--
> > 
> > Yes, what I did in fast-ast is, according to definition, a trie.
> > And, yes, one per context. And, yes, it works really well.
> > 
> > No, it's not a binary tree. But that's not a requirement.
> 
> I wasn't trying to suggest that this was necessary or required.  I
> was just pointing out for our application of yesteryear, running on
> the machines we had at the time, using the input regular expressions
> that we needed to support we had to go the route of doing "off-line"
> trie calculation.
> 
> As I recall it took on the order of 5 to 10 minutes to generate
> the trie for our application.  Which was an unacceptable delay
> whenever the prefix/suffix table changed (about once a week)
> or when the machine(s) running the application were restarted
> or the proxy restarted.
> 
> I was putting that out there more so that people knew there was an
> easy way to avoid to do extremely compute intensive updates without
> compromising the basic operation of the application.  Some people just
> throw up their hands and say "it's too hard to calculate this at
> runtime", whereas you really only need to do the recalculations when
> the contents of a context change.  (I have no idea how this intersects
> with the 'realtime' method of doing things.)
> 
> -Kurt

Kurt--

I only made the comment about the binary tree, because the example in
the article showed a binary tree...

Your comments are very good. You hit the required data structure
dead-center.

As far as generation time, With a large 1000+ input set, the trie/tree
is fairly quick to build, as the pattern length is usually only 10-16
chars long if that. If you do the trie/tree building at load time, this
shouldn't be a problem, and even if you do it at run time (as I think my
code does), it hasn't been a problem.

For the sake of stuff like Realtime, I feel that "editing" the tree, to
remove a pattern (probably by just marking it as dead), or adding a new
pattern, should be both possible, and lightening fast.

murf

-- 
Steve Murphy
Software Developer
Digium
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3227 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20070802/11cf99d4/attachment-0001.bin 


More information about the asterisk-dev mailing list