[asterisk-dev] [asterisk-commits] murf: trunk r89272 - /trunk/main/pbx.c
murf at digium.com
Fri Nov 16 11:07:54 CST 2007
On Fri, 2007-11-16 at 09:34 -0600, Kevin P. Fleming wrote:
> SVN commits to the Asterisk project wrote:
> > Author: murf
> > Date: Wed Nov 14 12:05:50 2007
> > New Revision: 89272
> > URL: http://svn.digium.com/view/asterisk?view=rev&rev=89272
> > Log:
> > Rescaled the weights of the patterns to give something more independent of pattern length; and make . less likely to win. Question: which should win for 14102241145-- _1xxxxxxx. or _XXXXXXXXXXX -- right now, the pure X pattern will win.
> That's a tough question to answer; in general, more specific patterns
> should be given higher scores, so the pattern starting with '1' is
> better. However, it is shorter and ends with '.', where the longer
> pattern is expecting a specific number of digits... so it is better.
> In my opinion the presence of more-specific characters should carry more
> weight than pattern length, but not so much weight that "_91." has a
> higher score than "_XXXXXX".
> We really need to get some community feedback on this... how about
> taking a poll and posing the question with 5 or 6 combinations of A/B
> patterns and asking (in each case) which one should win, then tabulating
> the results?
I think this is a good idea; from "expected" behavior we can generate an
algorithm that describes how we calculate the best match. But we need to
make the algorithm fairly simple, or nobody will be able to follow it,
and there'll be surprises at runtime.
I did my best to honor the only spec I know of about the pattern
matching: the comments in main/pbx.c:
* The extension match rules defined in the devmeeting
* quite simple: WE SELECT THE LONGEST MATCH.
* In detail, "longest" means the number of matched characters
* the extension. In case of ties (e.g. _XXX and 333) in the
* of a pattern, we give priority to entries with the smallest
* (e.g, [5-9] comes before [2-8] before the former has only 5
* while the latter has 7, etc.
* In case of same cardinality, the first element in the range
* If we still have a tie, any final '!' will make this as a
* less specific pattern.
So, my system interprets this by assigning weights to each pattern char,
where, right now, N gets 98, Z gets 99, X gets 100, . and ! get 200 (per
A sing char, non pattern, gets '1', and char ranges (  ), get a
count of the number of chars in the range, where the LOWEST score wins.
Why the big numbers like 100? Because, the length of average dialing
strings can be in the range of 11 chars (in the US), to longer
sequences, and if a 10-digit CID pattern is also supplied, the length of
the patterns can get up to 10 or 20 chars.
The affect of a single non-specific or specific character in the mix can
be lost in the sea of the other chars. So, scaling seemed like a good
trick to make the pattern chars stand out in the count. Besides, the
'cardinality' of a . pattern really is 256, as it can match all possible
bytes... It's highly non-specific. I guess I could get the same affect
by keeping separate counts of pattern chars vs literal matches...
The length of the matched string is also kept track of, and is the #1
criteria for the best match. In most cases, for an input string, all
patterns that match will have the same length, tho.... but, this does
weed out patterns that would
match the first part of the string.
At any rate, some simple verbal rules (like the above), with an
arithmetic description should make this sort of pattern matching easier
to implement and understand.
For instance, I could specify that '.' or '!' at the end of a pattern
will almost always lower it in priority below patterns that have no '.'.
Thus _1. will not be matched before:
to match 13075878001
_XXXXXXXXXXX should match before
_1XXXXXXX. or even before
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3227 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20071116/381a4b73/attachment.bin
More information about the asterisk-dev