[Asterisk-Users] Blocking the 'Do Not Call" List

Dana Nowell DanaNowell at cornerstonesoftware.com
Thu Aug 12 08:38:07 MST 2004


This is interesting but did you check space consumption?  On my system it
appears that each directory is about 4K bytes.  One million phone numbers
at 10 digits (or 1 number) is 10 million directories (or one million
'number' directories) at 4K bytes per that's 40Gig (or 4Gig).  I'm not
saying that's impossible but it seems to be an expensive why to store 10
meg of data (OK, 11 meg with a delimiter).


Seems to me an in-memory structure like:

struct leaf_node
  {
/* treat as bit array */
  char num_exists[125000];
  }

struct areacode_node
  {
  struct leaf_node *p[1000];
  }
or
a linked list/btree/quadtree/red-black of
struct areacode_node
  {
  char areacode[3];
  struct leaf_node *p;
  }

once you find the leaf node via the areacode node(s) you can check to see
if the bit is on/off via

charindex = (int) number/8
bitindex = number - charindex * 8

ICare = (p->num_exists[charindex] >> bitindex) & 1

total memory consumed is about a 128 megabyte stick for all possible US
numbers, it is in memory, and low latency (index into array, index into
array, shift, is pretty quick).  Requires that you periodically refresh the
values from a flat file/DB/whatever.  You can do that during a 'downtime'
OR use a 256 megabyte stick of memory and have two versions, active and
'building' then switch over quickly (flip 'active' pointer to the areacode
node).

Write it as a standalone process that accepts requests over a TCP/IP (or
UDP) socket and the world is a good place.

Interface might look like:

struct Iface
{
int IFaceVersion;
int RequestCode;
int tag; /* reply contains same tag as request */
int Length;
char RequestSpecificData[1]; /* is actually length bytes */
}

/* potential request codes */
#define RC_LOOKUP_REQ 1
#define RC_LOOKUP_REP 101
#define RC_RELOAD_REQ 2
#define RC_RELOAD_REP 102


Should be quick, small, simple to share, and easy to replicate or move to a
separate box if performance requires.  Now if I only had time ...


At 12:37 PM 8/12/2004 +1000, Adam Goryachev wrote:
>
>Speaking of which, how about a simple structure of directories, take the
>first number for each entry in the dnc, create a directory (dnc/5), then
>take the next number (dnc/5/5) and so on. Eventually you get
>dnc/5/5/5/1/2/3/4.
>
>When you need to dial the number, you simply check if the directory
>exists, if( -d dnc/5/5/5/1/2/3/4)
>
>If it exists, then skip it, if any directory doesn't exist, then call
>it.
>
>I am pretty sure that any filesystem should be able to handle 10
>directories in a directory, with 10 levels of sub-directory, each with a
>maximum of 10 directories.... (How many number do US numbers have??)
>
>Another option would be to just create a file for each full phone number
>(dnc/5551234) all in the same directory. Use a filesystem like reiserfs
>which does a binary search for the dirents... Well, something like that,
>reiserfs seems pretty good to me with a large number of files in a
>single directory...
>
>Actually, maybe you should test this yourself before you use the
>reiserfs method... (this is my maildir...)
>time ls|wc -l
>213222
>
>real    0m2.587s
>user    0m2.450s
>sys     0m0.200s
>
>Although I *think* ls will do a stat on every single file... someone
>with more knowledge than me can feel free to comment. At least this
>would remove any TCP setup time, plus network dependency etc...
>
>If you have multiple servers, just rsync every hour or whatever...
>
>Just my 0.02c worth...
>

-- 
Dana Nowell     Cornerstone Software Inc.
Voice: 603-595-7480 Fax: 603-882-7313
email: DanaNowell_at_CornerstoneSoftware.com



More information about the asterisk-users mailing list