[Asterisk-Dev] potentially nasty bugs in string pools usage

Steve Murphy murf at e-tools.com
Thu Jan 5 06:13:17 MST 2006


On Thu, 2006-01-05 at 05:51 -0700, asterisk-dev-request at lists.digium.com
wrote:


> first of all, despite the subject, i think that the new approach
> to handle string is a great step forward, because it hides the
> implementation details and a lot of trouble in playing with fixed
> or variable-size strings.
> 
> What i don't understand/like is the mechanism used for handling
> storage, i.e. the use of a single pool of memory for all strings
> in the same object, because it has the potential of causing extremely
> nasty and subtle bugs in the code, as i will show below.
> 
> Nevertheless, because the approach hides implementation
> details, i think this can be fixed with reasonable ease.
> 
> To come to the problem: i kind of understand the motivation for
> using a single pool -- this way, one does not have to call
> a malloc() for each string in a large object e.g. sip_pvt and the
> like.
> 
> However, look at the potentially nasty side effects:
> 
>  - at least in the current implementation, setting a string field
>    may cause a realloc(), which in turn may change the base addresses
>    of all string fields in the group, which in turn means that
>    we cannot safely use these string fields by reference if
>    something calls ast_string_field_set() in the middle.
> 
>         For instance, this specific bug is present in
>         chan_sip.c::initreqprep(), where at some point the
>         code saves a pointer l = p->fromuser, but then
>         we can have a call to ast_string_field_set(p, fromname, n)
>         thus invalidating the previous value of l, which however
>         is used a few lines below.
> 
>    I haven't checked if there are other places like this, but this was
>    just an example to show that even the author of this API will
>    have a hard time using it because of this kind of subtle side
> effects;
> 
>  - at least in the current implementation,
> ast_string_field_index_build()
>    ends up calling s[n]printf() twice, which is very prone to
>    errors e.g. when computing the arguments has side effects,
>    e.g. x++ . Additionally, s[n]printf() is an expensive function
>    so one would rather not have to pay the price twice.
>    This can be optimized a bit (e.g. by keeping track, for each
>    string field, of the available buffer space, and trying to reuse
>    it if possible, so only one s[n]printf is needed.
>    But the problem with side effects is actually aggravated because
>    at this point one wouldn't know if the argument has been evaluated
>    once or twice.
> 
>  - at least in the current implementation, freeing a string
>    does not allow to recover the storage (this can be overcome,
>    but i doubt we want to do that);
> 
> Given all the above, i do not feel very safe in using a single pool
> of memory for all strings.
> As a first step, I'd rather change the allocator so that
> each string is allocated independently, and pay the price for the
> extra malloc's() while the new macros start to get used in the code.
> At a later time, we can always replace the allocator underneath with
> one that may use pools if needed, but trying to address the
> correctness problem related to realloc and
> ast_string_field_index_build(),
> and maybe even to memory efficiency if we really want to.

I've written code that uses a central pool for string storage. The pool
area is a linked list of 1 meg memory blocks. Each string is registered
via a hash table. If a new string to be entered is bigger than what will
fit in the memory pool, a new block is allocated, and it is put there.
No realloc, no changing of all the string addresses. And not that hard
to implement.

In the application I did this for, I found a 10:1 ratio of commonality
of strings entered. Thus, the savings on heap space of about 90%... your
mileage may vary considerably, but in general, string pools like this
are great, they reduce memory requirements, they speed things up by not
calling malloc/free as much, they reduce the complexity of your
interface, as far as copying a string vs. storing the address of the
string type issues. 

murf


-- 
Steve Murphy <murf at e-tools.com>
Electronic Tools Company
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3317 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20060105/6c970eee/smime.bin


More information about the asterisk-dev mailing list