[asterisk-dev] Better pattern matching

Steve Murphy murf at parsetree.com
Wed Aug 1 18:03:50 CDT 2007


On Wed, 2007-08-01 at 15:15 -0400, Clod Patry wrote:
> you probably are searching for RFC 2705 ?
> 
> That would be great if Asterisk could support that kind of
> digitmaps/patterns.
> 

I've got this rfc in front of me now... it's the mgcp spec.

DigitMap = DigitString  / "(" DigitStringList ")"
DigitStringList = DigitString 0*( "|" DigitString )
DigitString = 1*(DigitStringElement)
DigitStringElement = DigitPosition ["."]
DigitPosition = DigitMapLetter / DigitMapRange
DigitMapLetter = DIGIT / "#" / "*" / "A" / "B" / "C" / "D" / "T"
DigitMapRange =  "x" / "[" 1*DigitLetter "]"
DigitLetter ::= *((DIGIT "-" DIGIT ) / DigitMapLetter)

Example:
     (0T| 00T|[1-7]xxx|8xxxxxxx|#xxxxxxx|*xx|91xxxxxxxxxx|9011x.T)

The use of '|' to "or" together multiple choices would just be shorthand
for more extensions. This would not affect my algorithm, but will
slowdown the original algorithm, by adding another choice to the list,
to test.

It's just shorthand for defining several extensions with the same
contents.

The use of the Timeout (T) stuff-- uh, I'd rather not think about that
kinda thing.

The mgcp spec doesn't really cover the full alphanumeric range that
you'd like to cover (Jared's Wish).

The trouble with doing full-alphanumeric pattern matching, is that we
already use X and Z for matching digits.... otherwise, we could allow
stuff like _XerceZ[a-zA-Z0-9]. Maybe we could trade X for @ and Z for %
or somesuch and get back those letters! And the - in character groups --
we need to allow similar syntax like _XerceZ[-a-ZA-Z0-9], to allow
dashes.

Next,  variable length stuff. We could use some sort of notation to
allow wildcarding any length of string if we have a clear idea of what
ends it.
The example of _0X.#, where any number of non-# chars followed by #
could be workable in both the old and new pattern matchers, without
resorting to regex's.
Even _0X.#1X could be possible. If one char can come after a variable
len 
string, then why not more?

Let's see... (123)* means 0 or more 123 patterns. This would be
difficult, as it forms loops in the matcher. Again, a matter for a
state-machine matcher. Same 
occurs with (123)+, 1 or more 123 patterns.

Repetition notation like (123){0,5} (0 to 5 repetition of 123) explode
out to
a choice of fixed patterns in the matcher: this would be equiv to 
nothing | 123 | 123123 | 123123123 | 123123123123 | 123123123123123 

The current algorithm would have to be re-engineered for some of this,
and so would the speedup algorithm, but at least some stuff could be
added that wouldn't require resorting to regex packages.

murf


> 
> On 8/1/07, Jared Smith <jsmith at digium.com> wrote:
>         On Wed, 2007-08-01 at 11:44 -0500, John Lange wrote: 
>         > Searching the archives I can see that better pattern
>         matching has been
>         > discussed a number of times.
>         >
>         > There have even been patches submitted and discussions about
>         how there
>         > are regexp libraries available under BSD style license that
>         could be 
>         > incorporated into Asterisk.
>         >
>         > Yet, nothing ever makes it into the released code.
>         
>         Yes, I've been one of the people who have been more vocal than
>         most
>         about this.  I really think it's a good idea to be a little
>         more 
>         flexible in our pattern matching ability.  That being said, I
>         can see
>         where the Asterisk developers are coming from too -- they're
>         worried
>         that this will severely impact the time it takes for Asterisk
>         to do
>         pattern matching.  The current pattern matching syntax works
>         fairly well
>         for numeric extensions, but is pretty difficult work with if
>         you trying
>         to do pattern matching on names or alphanumeric
>         extensions.  For
>         example, try coming up with a pattern match that'll match the
>         name Nancy 
>         (either upper or lower case) followed by two digits.  It
>         becomes even
>         more difficult when dealing with letters that fall outside the
>         English
>         alphabet.
>         
>         While I personally think regular expressions are a pretty cool
>         idea, I 
>         know not everyone is sold on them.  If I remember correctly,
>         there's
>         even an RFC with a very complete pattern matching syntax
>         defined (which
>         I can't find right now, of course).
>         
>         Hopefully, one of the Asterisk developers will stumble across
>         this post 
>         and take pity on us and help us out!
>         
>         
>         --
>         Jared Smith
>         Community Relations Manager
>         Digium, Inc.
>         
>         
>         _______________________________________________
>         --Bandwidth and Colocation Provided by
>         http://www.api-digital.com--
>         
>         asterisk-dev mailing list
>         To UNSUBSCRIBE or update options visit:
>            http://lists.digium.com/mailman/listinfo/asterisk-dev
> 
> 
> 
> -- 
> Clod Patry 
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
> 
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3239 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20070801/6e489672/attachment-0001.bin 


More information about the asterisk-dev mailing list