[asterisk-dev] [Code Review] Split up SIP UDP receiving and proceeding

Tue Oct 12 12:13:28 CDT 2010

Hi,

I wonder what is the practical difference between packets waiting in  
kernel queue or internal asterisk queue?
You economise on deadlok avoidanace code in handle_request_do?

Thanks
Vadim

Le 11 oct. 10 à 11:09, sst at sil.at a écrit :

>
>
>> On 2010-10-09 18:06:52, Russell Bryant wrote:
>>> I haven't looked at the code yet, but as a general comment, if you  
>>> want to split up UDP packet processing and scheduler processing, I  
>>> think moving the scheduler into its own thread makes more sense.   
>>> That way, you don't have to do so much copying of data and then  
>>> signaling another thread for each packet that comes in.
>>>
>>> Moving the scheduler to its own thread shouldn't be too hard.   
>>> I've done it once before, but got crashes with load testing.  It  
>>> was a long time ago, though.  There is an ast_sched_thread API you  
>>> can use to automatically handle the scheduler thread.  However, it  
>>> does require touching a lot of places in chan_sip that schedule  
>>> callbacks.
>>>
>>> I've actually been thinking about re-doing how ast_sched_thread  
>>> works.  I'd like to get rid of the sched_thread data structure and  
>>> instead just optionally have the thread data in the  
>>> sched_context.  I'd like to be able to use all of the same API  
>>> calls for the scheduler, but just have an optional API call to  
>>> start running a thread on a scheduler context.  If we did that,  
>>> the changes to chan_sip to put the scheduler in its own thread  
>>> would be _trivial_.
>>
>> schmidts wrote:
>>    thats not exactly what i have tried here. i split up the  
>> receiving of a udp packet and the processing of this packet. i dont  
>> touch the scheduler part by now, except it works more reliable and  
>> in time.
>>
>>    i try to show what i have done:
>>    this is how incoming sip udp packets are handled by now:
>>    thread do_monitor
>>        -> ast_io_wait (polls the sip udp socket)
>>             -> if a packet is recieved a callback to sipsock_read  
>> is fired
>>                  -> sipsock_read does a recfrom on the socket and  
>> then calles handle_request_do with the request which is received  
>> (this could be slow).
>>
>>    what i have done looks like this:
>>    thread do_monitor
>>        -> ast_io_wait (polls the sip udp socket)
>>             -> if a packet is recieved a callback to sipsock_read  
>> is fired
>>                  -> sipsock_read does a recfrom on the socket and  
>> then adds the packet to the receive queue
>>
>>    the sip request thread checks this list (thats the one with high  
>> cpu)
>>        -> if there is a request in the list call handle_request_do  
>> with it
>>
>>    with my patch receiving a request and store it in the  
>> request_queue tooks max 30 usec then the next loop of ast_io_wait  
>> could be started to wait for the next request or timeout for  
>> scheduled events. so if an incoming request tooks long time to be  
>> handled in handle_request_do (like an invite) in this time other  
>> packets would be received and stored in the queue to be processed  
>> as soon as possible. by now if handle_request_do tooks too long,  
>> first the scheduler will be done (and maybe send out many events  
>> cause its allready late) then in the next loop of do_monitor only  
>> one packet will be done.
>>
>>    but i give you a +1 for the scheduler thread idea too ;)
>>
>> Russell Bryant wrote:
>>    What problem are you solving, though?  From your description it  
>> sounds like you're saying it is because of the scheduler that it  
>> takes too long to get around to processing the next packet.  Are  
>> you worried that the kernel won't buffer enough so packets will be  
>> dropped?  What problems have you observed that led you to want to  
>> make these changes?  I think the "right" thing to do to improve UDP  
>> packet processing is to move to a thread pool.  It's not trivial,  
>> but I think we have learned a lot of lessons about the right and  
>> wrong ways to do it with experience doing the same thing in  
>> chan_iax2.
>>
>> schmidts wrote:
>>    No not the scheduler, the handle_request_do takes so long. Cause  
>> there is also a deadlock avoidance done in here and some other  
>> things which just tooks time.
>>    as i said above when a package is received and handle_request_do  
>> is started with this request, until its finished nothing else could  
>> be done in the do_monitor loop.
>>
>>    i will try to explain where i see this problems.
>>    in do_monitor there are 3 functions called.
>>    *) ast_sched_wait which gives back when the next sched event  
>> should be processed in ms.
>>    *) ast_io_wait polls the file descriptors and wait until the  
>> timeout which is the result from above. if a package receives  
>> meanwhile it is processed by the callback to sipsock_read which  
>> calles handle_request_do.
>>    *) ast_sched_runq this looks for all events in a timeframe for 1  
>> ms and execute them.
>>
>>    so what i have seen in my tests and also in production system is  
>> the following:
>>    data are only for example:
>>    1) ast_sched_wait return with 2 ms.
>>    2) ast_io_wait polls the fd until 2 ms are reached. meanwhile a  
>> package is received for example an INVITE which cause a  
>> handle_statechange and other things. Cause of the deadlock  
>> avoidance handle_request_do needs for example 5 ms to proceed this  
>> package. This means we are allready 3 ms late for the next event  
>> and in this time no other package is received (yes the kernel  
>> buffers this but they are not processed)
>>    3) ast_sched_runq see that we are late and now have for example  
>> allready 4 events to exec.
>>    4) we loop again in do_monitor ast_sched_wait returns maybe 0 ms  
>> for the next event.
>>    5) ast_io_wait allready has a package in the buffer so the  
>> callback is started again and its possible that this callback again  
>> needs some time for example only 1 ms
>>    6) ast_sched_runq see that wer are again late (only 1 ms this  
>> time) so for example only 2 events are execed.
>>
>>    in stats this would look like this:
>>    in a timeframe of 7 ms only 2 packages are received but 6 are  
>> sent out via scheduled events.
>>
>>    if you look at the above example you see that the amount of  
>> received and proceeded inbound packets could differ from the amount  
>> of execed scheduled events. with my patch the scheduler could  
>> hardly be late and also if handle_request_do needs long, further  
>> incoming packages are received just in time but maybe processed  
>> later.
>>
>
> GRRRRR i found my error, it wasnt in this thread i have done a  
> mistake in the monitor thread itself.
> and i know now how to handle a condition, i havent know that i have  
> to hold the lock to wait for an condition. i dont know why but i  
> thought this looks wrong but know i now, waiting for the condition  
> unlocks the mutex until the condition occurs and then locks it again.
> i will do some heavy load testing now after i solve this cpu problem  
> i had, to show you what this means for asterisk sip message handling  
> and why i think this could help to improve performance.
>
>
> - schmidts
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/970/#review2819
> -----------------------------------------------------------
>
>
> On 2010-10-09 16:31:06, schmidts wrote:
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviewboard.asterisk.org/r/970/
>> -----------------------------------------------------------
>>
>> (Updated 2010-10-09 16:31:06)
>>
>>
>> Review request for Asterisk Developers.
>>
>>
>> Summary
>> -------
>>
>> i need help cause i got a little stuck about this.
>> with this patch the receiving and proceeding of sip packages over  
>> udp is split up and also in do_monitor the order how the  
>> ast_io_wait function is called.
>> my idea is, if a slow handling in handle_request happens it would  
>> not block further incoming packets and scheduled events.
>>
>> my problem is that the thread i have created which waits for  
>> requests to proceed creates a cpu load of 100% and i dont know why.
>> i have tried using ast_cond_wait on this but i cant get this run,  
>> cause i cant hit the right locking order. :(
>>
>> have i missed something essential here or why does this thread  
>> needs that much cpu time?
>>
>> this patch is against 1.6.2 branch cause i need it for 1.6.2. if  
>> this works as i hope i moved it against trunk.
>>
>>
>> Diffs
>> -----
>>
>>  branches/1.6.2/channels/chan_sip.c 291037
>>
>> Diff: https://reviewboard.asterisk.org/r/970/diff
>>
>>
>> Testing
>> -------
>>
>>
>> Thanks,
>>
>> schmidts
>>
>>
>
>
> -- 
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-dev