[asterisk-dev] A problem with Hash Tables...

Steve Murphy murf at digium.com
Fri Sep 7 22:53:39 CDT 2007


Luigi--

I have a big Hash Table problem, and your ast_obj2 is part of it! ;)
You see, I've developed a competing Hash table implementation (I call
them hashtabs), and now I have to decide whether to scrap mine, merge
the neat stuff in mine into yours, or vice-versa, or let the two coexist
peaceably.

So, perhaps you can help me with this baffling situation!

I have done a fairly involved evaluation of the ast_obj2 code,
from reading the .c and .h files, to doing a test implementation (which
I did wrong, btw, and filed a bug, and then closed it when my
misconception was pointed out).

Our hash table implementations are VERY similar. Almost call-for-call
equivalent, but not quite.

Both appear fairly solid.

Your code has two levels of abstraction: container and object.
Alas, mine just worries about the container level. No ref-counting.

Mine is noticeably faster in some situations. For a test set of 10
independent threads, each doing 100,000 random operations on the hashtab
(which slowly builds from 0 to ~220K objects), mine takes about 2.17
sec, and ast_obj2 takes about 3.32 sec. With a 1% mix of complete
traversals, mine goes to 52 secs, and ast_obj2 takes about 15 minutes. I
don't know why this is; at first I thought it was because I use a
read-lock during the hashtab traversal, which allows other threads to do
queries during the traversal; but I tried a write-lock version of
hashtab traversal, and it finishes in about the same amount of time (52
sec)... 

The main complaint I have against ast_obj2 is the lack of a resize
capability.
the hashtab run times I quoted include about 7 resize operations, each
doubling the size of the table. And it still runs faster than the
statically sized ast_obj2 table, which is intriguing to me.

I implement resizing via 2 callbacks: a resize boolean, which would
return a 1 if the table needs to be resized, and a newsize function,
that returns an integer specifying the desired size. I guess I could
have combined them into a single func, but splitting the two so the
calculations for the new size would not have to be performed all the
time seemed better.

Hashtabs also only allow the number of table entries to be a prime
number. If you specify a hash table size of 8, it will increase the size
until it is 11. This allows the modulus to be a little better at
spreading out the hash values.

Hashtabs use doubly-linked lists to link together the buckets. Faster
delete times. And another doubly-linked list that links together all
objects stored in the container, which comes in handy both for a simple
traversal routine, and for resizing, because you can relink all the
buckets without having to gather their objects first.

For instance, in the ast_obj2 regime, because the table cannot be
resized, most folks will try to size the table for its estimated maximum
size. The traversal, assuming objects are evenly distributed thru the
table, will take the same amount of time to traverse, when there there
only a few objects stored in the table, or when the table is nearly
full. This is because it has to look at every entry in the bucket array,
to see if there are buckets there. Hashtabs just follow the linked list
of all objects. If that list contains only 3 objects, then after 3 steps
it is finished.


Ok, so much for my evaluation. What do you think I should do? Can we
combine some of the hashtab features (like the two doubly-linked lists,
and the resize capability) into ast_obj2? Or are these concepts
incompatible with your current locking regime? Or, Can we live with two
hashtable implementations?

murf


-- 
Steve Murphy
Software Developer
Digium
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3227 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20070907/003dfca9/attachment-0001.bin 


More information about the asterisk-dev mailing list