[asterisk-dev] Better pattern matching

John Lange john.lange at open-it.ca
Wed Aug 1 18:42:39 CDT 2007


Steve, first let me say that your work at measuring the performance
issues surrounding the existing system in Asterisk is quite amazing.
Good work.

At first blush the mgcp RFC would seem like the natural choice because
it is already implemented in most SIP devices, it's much easier to learn
than regexp, and as such it is something that many VOIP admins would
already be familiar with.

Unfortunately, as you mentioned it does not cover alpha so it is still
not a comprehensive answer but given unlimited time and resources I
think it should be one of the options for pattern matching. However the
priority should still be to do regexp first. For one thing PCRE already
exists as a library that can be used so it should be easier implement.

Given some of your comments below I would just like to reiterate that I
don't think _anything_ should be done to the current system. New systems
of pattern matching should be added using different designations to
trigger different types of matching.

So for example; the existing system uses underscore:

exten => _1NXXNXXXXXX,1, ...

mgcp could use brackets:

exten => (0T| 00T|1xxxxxxxxxx.T),1, ...

regexp could use its traditional slashs:

exten => /1[2-9][0-9]+/,1, ...

etc.

> In order to speed up the algorithm, and  make response more flat in
> asterisk, no matter the size or composition of the dialplan, I
> resorted to using hash tables

Does this imply that this is code that is already in Asterisk?

John

On Wed, 2007-08-01 at 17:03 -0600, Steve Murphy wrote:
> On Wed, 2007-08-01 at 15:15 -0400, Clod Patry wrote:
> > you probably are searching for RFC 2705 ?
> > 
> > That would be great if Asterisk could support that kind of
> > digitmaps/patterns.
> > 
> 
> I've got this rfc in front of me now... it's the mgcp spec.
> 
> DigitMap = DigitString  / "(" DigitStringList ")"
> DigitStringList = DigitString 0*( "|" DigitString )
> DigitString = 1*(DigitStringElement)
> DigitStringElement = DigitPosition ["."]
> DigitPosition = DigitMapLetter / DigitMapRange
> DigitMapLetter = DIGIT / "#" / "*" / "A" / "B" / "C" / "D" / "T"
> DigitMapRange =  "x" / "[" 1*DigitLetter "]"
> DigitLetter ::= *((DIGIT "-" DIGIT ) / DigitMapLetter)
> 
> Example:
>      (0T| 00T|[1-7]xxx|8xxxxxxx|#xxxxxxx|*xx|91xxxxxxxxxx|9011x.T)
> 
> The use of '|' to "or" together multiple choices would just be shorthand
> for more extensions. This would not affect my algorithm, but will
> slowdown the original algorithm, by adding another choice to the list,
> to test.
> 
> It's just shorthand for defining several extensions with the same
> contents.
> 
> The use of the Timeout (T) stuff-- uh, I'd rather not think about that
> kinda thing.
> 
> The mgcp spec doesn't really cover the full alphanumeric range that
> you'd like to cover (Jared's Wish).
> 
> The trouble with doing full-alphanumeric pattern matching, is that we
> already use X and Z for matching digits.... otherwise, we could allow
> stuff like _XerceZ[a-zA-Z0-9]. Maybe we could trade X for @ and Z for %
> or somesuch and get back those letters! And the - in character groups --
> we need to allow similar syntax like _XerceZ[-a-ZA-Z0-9], to allow
> dashes.
> 
> Next,  variable length stuff. We could use some sort of notation to
> allow wildcarding any length of string if we have a clear idea of what
> ends it.
> The example of _0X.#, where any number of non-# chars followed by #
> could be workable in both the old and new pattern matchers, without
> resorting to regex's.
> Even _0X.#1X could be possible. If one char can come after a variable
> len 
> string, then why not more?
> 
> Let's see... (123)* means 0 or more 123 patterns. This would be
> difficult, as it forms loops in the matcher. Again, a matter for a
> state-machine matcher. Same 
> occurs with (123)+, 1 or more 123 patterns.
> 
> Repetition notation like (123){0,5} (0 to 5 repetition of 123) explode
> out to
> a choice of fixed patterns in the matcher: this would be equiv to 
> nothing | 123 | 123123 | 123123123 | 123123123123 | 123123123123123 
> 
> The current algorithm would have to be re-engineered for some of this,
> and so would the speedup algorithm, but at least some stuff could be
> added that wouldn't require resorting to regex packages.
> 
> murf
> 
> 
> > 
> > On 8/1/07, Jared Smith <jsmith at digium.com> wrote:
> >         On Wed, 2007-08-01 at 11:44 -0500, John Lange wrote: 
> >         > Searching the archives I can see that better pattern
> >         matching has been
> >         > discussed a number of times.
> >         >
> >         > There have even been patches submitted and discussions about
> >         how there
> >         > are regexp libraries available under BSD style license that
> >         could be 
> >         > incorporated into Asterisk.
> >         >
> >         > Yet, nothing ever makes it into the released code.
> >         
> >         Yes, I've been one of the people who have been more vocal than
> >         most
> >         about this.  I really think it's a good idea to be a little
> >         more 
> >         flexible in our pattern matching ability.  That being said, I
> >         can see
> >         where the Asterisk developers are coming from too -- they're
> >         worried
> >         that this will severely impact the time it takes for Asterisk
> >         to do
> >         pattern matching.  The current pattern matching syntax works
> >         fairly well
> >         for numeric extensions, but is pretty difficult work with if
> >         you trying
> >         to do pattern matching on names or alphanumeric
> >         extensions.  For
> >         example, try coming up with a pattern match that'll match the
> >         name Nancy 
> >         (either upper or lower case) followed by two digits.  It
> >         becomes even
> >         more difficult when dealing with letters that fall outside the
> >         English
> >         alphabet.
> >         
> >         While I personally think regular expressions are a pretty cool
> >         idea, I 
> >         know not everyone is sold on them.  If I remember correctly,
> >         there's
> >         even an RFC with a very complete pattern matching syntax
> >         defined (which
> >         I can't find right now, of course).
> >         
> >         Hopefully, one of the Asterisk developers will stumble across
> >         this post 
> >         and take pity on us and help us out!
> >         
> >         
> >         --
> >         Jared Smith
> >         Community Relations Manager
> >         Digium, Inc.
> >         
> >         
> >         _______________________________________________
> >         --Bandwidth and Colocation Provided by
> >         http://www.api-digital.com--
> >         
> >         asterisk-dev mailing list
> >         To UNSUBSCRIBE or update options visit:
> >            http://lists.digium.com/mailman/listinfo/asterisk-dev
> > 
> > 
> > 
> > -- 
> > Clod Patry 
> > _______________________________________________
> > --Bandwidth and Colocation Provided by http://www.api-digital.com--
> > 
> > asterisk-dev mailing list
> > To UNSUBSCRIBE or update options visit:
> >    http://lists.digium.com/mailman/listinfo/asterisk-dev
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
> 
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev




More information about the asterisk-dev mailing list