[asterisk-dev] [Code Review] Append two container for dialogs to delete and for rtp timeout checks to replace all dialogs container in ao2 callback for dialogs_needdestroy

Fri Sep 17 15:39:09 CDT 2010

On 09/17/2010 03:23 PM, sst at sil.at wrote:
> 
> 
>> On 2010-09-17 14:22:49, David Vossel wrote:
>>> For some reason I was under the impression this patch was against Trunk instead of 1.6.2.
>>>
>>> This is not entirely my call to make, but I do not believe this has a high probability of being accepted for merger into 1.8, and virtually no probability of being merged into 1.6.2.
>>>
>>> If the patch is targeted for merger only into trunk, solving the RTP container issue is easy as we have already identified where linking should occur.  Were you expecting this to go into 1.6.2?  If so, I'd begin by discussing this with Russell.
>>
>> schmidts wrote:
>>     sorry for the missunderstanding. as far as i have seen in trunk these part of chan_sip.c is nearly the same like in 1.6.2.
>>     
>>     i will make a patch and test it against trunk. and also i will talk to Russell. 
>>     
>>     As far as i have understood him, and atleast why i have started with this, this slow iterate could cause deadlocks in chan_sip.c
>>     
>>     thank you David
>>
>> David Vossel wrote:
>>     When there are issues with deadlocks occurring with this monitor loop, they usually occur outside of this thread because of invalid locking order.  This optimization may lessen the probability of a deadlock occurring since it reduces the number of dialogs traversed at a single time, but if invalid locking order exists this patch does nothing to help that... At least I can't think of any locking issues resolved by this patch, I could be wrong :)
> 
> i am not sure if its a deadlock or atleast just a lock which tooks too long, but i have seen the following problem with 1.6.2, live on my production system.
> if the ao2 callback to iterate through all dialogs needs to much time, the ast_sched_wait timeout will sometimes be near to 0 ms, then ast_io_wait polls only 1 incoming message from the fd (if sip runs only on udp, no tcp involved) and then in the ast_sched_runq would be many messages send. 
> after several seconds the ratio from reading incoming messages and sending messages will sometimes grows up to 1:100. This cause several events of unreachable poke peers and so on.
> you can simple try this registering for example 10k peers on a normal 1.6.2 system. at a point the first user gets unrechable (in my tests, around 7k) then every further second more and more peers goes offline cause there is no time left to read all incoming messages, but send out much.
> 
> there are two parts which slow down the do_monitor loop, first is allready fixed by russell after a peers registered, the contact data is written into the astdb. the second part is this patch.
> with both parts the ratio between reading and sending is overall 1:1 cause its fast enough.
> i am with you, a real deadlock or lock happens on another place/thread, and this patch fixes nothing against this locks. but its a big different in waiting 40 usec for a pvt to come free or 2000 usec.

There's no reason to only process one incoming datagram when
ast_io_wait() says the fd is ready... the do_monitor loop could easily
read and process more messages before going back to sleep or checking
for scheduler entries.

-- 
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
skype: kpfleming | jabber: kfleming at digium.com
Check us out at www.digium.com & www.asterisk.org