[asterisk-dev] [Code Review] Split up SIP UDP receiving and proceeding
Vadim Lebedev
vadim at mbdsys.com
Tue Oct 12 12:13:28 CDT 2010
Hi,
I wonder what is the practical difference between packets waiting in
kernel queue or internal asterisk queue?
You economise on deadlok avoidanace code in handle_request_do?
Thanks
Vadim
Le 11 oct. 10 à 11:09, sst at sil.at a écrit :
>
>
>> On 2010-10-09 18:06:52, Russell Bryant wrote:
>>> I haven't looked at the code yet, but as a general comment, if you
>>> want to split up UDP packet processing and scheduler processing, I
>>> think moving the scheduler into its own thread makes more sense.
>>> That way, you don't have to do so much copying of data and then
>>> signaling another thread for each packet that comes in.
>>>
>>> Moving the scheduler to its own thread shouldn't be too hard.
>>> I've done it once before, but got crashes with load testing. It
>>> was a long time ago, though. There is an ast_sched_thread API you
>>> can use to automatically handle the scheduler thread. However, it
>>> does require touching a lot of places in chan_sip that schedule
>>> callbacks.
>>>
>>> I've actually been thinking about re-doing how ast_sched_thread
>>> works. I'd like to get rid of the sched_thread data structure and
>>> instead just optionally have the thread data in the
>>> sched_context. I'd like to be able to use all of the same API
>>> calls for the scheduler, but just have an optional API call to
>>> start running a thread on a scheduler context. If we did that,
>>> the changes to chan_sip to put the scheduler in its own thread
>>> would be _trivial_.
>>
>> schmidts wrote:
>> thats not exactly what i have tried here. i split up the
>> receiving of a udp packet and the processing of this packet. i dont
>> touch the scheduler part by now, except it works more reliable and
>> in time.
>>
>> i try to show what i have done:
>> this is how incoming sip udp packets are handled by now:
>> thread do_monitor
>> -> ast_io_wait (polls the sip udp socket)
>> -> if a packet is recieved a callback to sipsock_read
>> is fired
>> -> sipsock_read does a recfrom on the socket and
>> then calles handle_request_do with the request which is received
>> (this could be slow).
>>
>> what i have done looks like this:
>> thread do_monitor
>> -> ast_io_wait (polls the sip udp socket)
>> -> if a packet is recieved a callback to sipsock_read
>> is fired
>> -> sipsock_read does a recfrom on the socket and
>> then adds the packet to the receive queue
>>
>> the sip request thread checks this list (thats the one with high
>> cpu)
>> -> if there is a request in the list call handle_request_do
>> with it
>>
>> with my patch receiving a request and store it in the
>> request_queue tooks max 30 usec then the next loop of ast_io_wait
>> could be started to wait for the next request or timeout for
>> scheduled events. so if an incoming request tooks long time to be
>> handled in handle_request_do (like an invite) in this time other
>> packets would be received and stored in the queue to be processed
>> as soon as possible. by now if handle_request_do tooks too long,
>> first the scheduler will be done (and maybe send out many events
>> cause its allready late) then in the next loop of do_monitor only
>> one packet will be done.
>>
>> but i give you a +1 for the scheduler thread idea too ;)
>>
>> Russell Bryant wrote:
>> What problem are you solving, though? From your description it
>> sounds like you're saying it is because of the scheduler that it
>> takes too long to get around to processing the next packet. Are
>> you worried that the kernel won't buffer enough so packets will be
>> dropped? What problems have you observed that led you to want to
>> make these changes? I think the "right" thing to do to improve UDP
>> packet processing is to move to a thread pool. It's not trivial,
>> but I think we have learned a lot of lessons about the right and
>> wrong ways to do it with experience doing the same thing in
>> chan_iax2.
>>
>> schmidts wrote:
>> No not the scheduler, the handle_request_do takes so long. Cause
>> there is also a deadlock avoidance done in here and some other
>> things which just tooks time.
>> as i said above when a package is received and handle_request_do
>> is started with this request, until its finished nothing else could
>> be done in the do_monitor loop.
>>
>> i will try to explain where i see this problems.
>> in do_monitor there are 3 functions called.
>> *) ast_sched_wait which gives back when the next sched event
>> should be processed in ms.
>> *) ast_io_wait polls the file descriptors and wait until the
>> timeout which is the result from above. if a package receives
>> meanwhile it is processed by the callback to sipsock_read which
>> calles handle_request_do.
>> *) ast_sched_runq this looks for all events in a timeframe for 1
>> ms and execute them.
>>
>> so what i have seen in my tests and also in production system is
>> the following:
>> data are only for example:
>> 1) ast_sched_wait return with 2 ms.
>> 2) ast_io_wait polls the fd until 2 ms are reached. meanwhile a
>> package is received for example an INVITE which cause a
>> handle_statechange and other things. Cause of the deadlock
>> avoidance handle_request_do needs for example 5 ms to proceed this
>> package. This means we are allready 3 ms late for the next event
>> and in this time no other package is received (yes the kernel
>> buffers this but they are not processed)
>> 3) ast_sched_runq see that we are late and now have for example
>> allready 4 events to exec.
>> 4) we loop again in do_monitor ast_sched_wait returns maybe 0 ms
>> for the next event.
>> 5) ast_io_wait allready has a package in the buffer so the
>> callback is started again and its possible that this callback again
>> needs some time for example only 1 ms
>> 6) ast_sched_runq see that wer are again late (only 1 ms this
>> time) so for example only 2 events are execed.
>>
>> in stats this would look like this:
>> in a timeframe of 7 ms only 2 packages are received but 6 are
>> sent out via scheduled events.
>>
>> if you look at the above example you see that the amount of
>> received and proceeded inbound packets could differ from the amount
>> of execed scheduled events. with my patch the scheduler could
>> hardly be late and also if handle_request_do needs long, further
>> incoming packages are received just in time but maybe processed
>> later.
>>
>
> GRRRRR i found my error, it wasnt in this thread i have done a
> mistake in the monitor thread itself.
> and i know now how to handle a condition, i havent know that i have
> to hold the lock to wait for an condition. i dont know why but i
> thought this looks wrong but know i now, waiting for the condition
> unlocks the mutex until the condition occurs and then locks it again.
> i will do some heavy load testing now after i solve this cpu problem
> i had, to show you what this means for asterisk sip message handling
> and why i think this could help to improve performance.
>
>
> - schmidts
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/970/#review2819
> -----------------------------------------------------------
>
>
> On 2010-10-09 16:31:06, schmidts wrote:
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviewboard.asterisk.org/r/970/
>> -----------------------------------------------------------
>>
>> (Updated 2010-10-09 16:31:06)
>>
>>
>> Review request for Asterisk Developers.
>>
>>
>> Summary
>> -------
>>
>> i need help cause i got a little stuck about this.
>> with this patch the receiving and proceeding of sip packages over
>> udp is split up and also in do_monitor the order how the
>> ast_io_wait function is called.
>> my idea is, if a slow handling in handle_request happens it would
>> not block further incoming packets and scheduled events.
>>
>> my problem is that the thread i have created which waits for
>> requests to proceed creates a cpu load of 100% and i dont know why.
>> i have tried using ast_cond_wait on this but i cant get this run,
>> cause i cant hit the right locking order. :(
>>
>> have i missed something essential here or why does this thread
>> needs that much cpu time?
>>
>> this patch is against 1.6.2 branch cause i need it for 1.6.2. if
>> this works as i hope i moved it against trunk.
>>
>>
>> Diffs
>> -----
>>
>> branches/1.6.2/channels/chan_sip.c 291037
>>
>> Diff: https://reviewboard.asterisk.org/r/970/diff
>>
>>
>> Testing
>> -------
>>
>>
>> Thanks,
>>
>> schmidts
>>
>>
>
>
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-dev
More information about the asterisk-dev
mailing list