[asterisk-dev] [Code Review] Split up SIP UDP receiving and proceeding

Sun Oct 10 06:20:42 CDT 2010

> On 2010-10-09 18:06:52, Russell Bryant wrote:
> > I haven't looked at the code yet, but as a general comment, if you want to split up UDP packet processing and scheduler processing, I think moving the scheduler into its own thread makes more sense.  That way, you don't have to do so much copying of data and then signaling another thread for each packet that comes in.
> > 
> > Moving the scheduler to its own thread shouldn't be too hard.  I've done it once before, but got crashes with load testing.  It was a long time ago, though.  There is an ast_sched_thread API you can use to automatically handle the scheduler thread.  However, it does require touching a lot of places in chan_sip that schedule callbacks.
> > 
> > I've actually been thinking about re-doing how ast_sched_thread works.  I'd like to get rid of the sched_thread data structure and instead just optionally have the thread data in the sched_context.  I'd like to be able to use all of the same API calls for the scheduler, but just have an optional API call to start running a thread on a scheduler context.  If we did that, the changes to chan_sip to put the scheduler in its own thread would be _trivial_.
> 
> schmidts wrote:
>     thats not exactly what i have tried here. i split up the receiving of a udp packet and the processing of this packet. i dont touch the scheduler part by now, except it works more reliable and in time.
>     
>     i try to show what i have done:
>     this is how incoming sip udp packets are handled by now:
>     thread do_monitor 
>         -> ast_io_wait (polls the sip udp socket)
>              -> if a packet is recieved a callback to sipsock_read is fired
>                   -> sipsock_read does a recfrom on the socket and then calles handle_request_do with the request which is received (this could be slow).
>     
>     what i have done looks like this:
>     thread do_monitor
>         -> ast_io_wait (polls the sip udp socket)
>              -> if a packet is recieved a callback to sipsock_read is fired
>                   -> sipsock_read does a recfrom on the socket and then adds the packet to the receive queue
>     
>     the sip request thread checks this list (thats the one with high cpu)
>         -> if there is a request in the list call handle_request_do with it
>     
>     with my patch receiving a request and store it in the request_queue tooks max 30 usec then the next loop of ast_io_wait could be started to wait for the next request or timeout for scheduled events. so if an incoming request tooks long time to be handled in handle_request_do (like an invite) in this time other packets would be received and stored in the queue to be processed as soon as possible. by now if handle_request_do tooks too long, first the scheduler will be done (and maybe send out many events cause its allready late) then in the next loop of do_monitor only one packet will be done.
>     
>     but i give you a +1 for the scheduler thread idea too ;)
> 
> Russell Bryant wrote:
>     What problem are you solving, though?  From your description it sounds like you're saying it is because of the scheduler that it takes too long to get around to processing the next packet.  Are you worried that the kernel won't buffer enough so packets will be dropped?  What problems have you observed that led you to want to make these changes?  I think the "right" thing to do to improve UDP packet processing is to move to a thread pool.  It's not trivial, but I think we have learned a lot of lessons about the right and wrong ways to do it with experience doing the same thing in chan_iax2.

No not the scheduler, the handle_request_do takes so long. Cause there is also a deadlock avoidance done in here and some other things which just tooks time.
as i said above when a package is received and handle_request_do is started with this request, until its finished nothing else could be done in the do_monitor loop.

i will try to explain where i see this problems.
in do_monitor there are 3 functions called.
*) ast_sched_wait which gives back when the next sched event should be processed in ms.
*) ast_io_wait polls the file descriptors and wait until the timeout which is the result from above. if a package receives meanwhile it is processed by the callback to sipsock_read which calles handle_request_do.
*) ast_sched_runq this looks for all events in a timeframe for 1 ms and execute them.

so what i have seen in my tests and also in production system is the following:
data are only for example:
1) ast_sched_wait return with 2 ms.
2) ast_io_wait polls the fd until 2 ms are reached. meanwhile a package is received for example an INVITE which cause a handle_statechange and other things. Cause of the deadlock avoidance handle_request_do needs for example 5 ms to proceed this package. This means we are allready 3 ms late for the next event and in this time no other package is received (yes the kernel buffers this but they are not processed)
3) ast_sched_runq see that we are late and now have for example allready 4 events to exec.
4) we loop again in do_monitor ast_sched_wait returns maybe 0 ms for the next event.
5) ast_io_wait allready has a package in the buffer so the callback is started again and its possible that this callback again needs some time for example only 1 ms
6) ast_sched_runq see that wer are again late (only 1 ms this time) so for example only 2 events are execed.

in stats this would look like this:
in a timeframe of 7 ms only 2 packages are received but 6 are sent out via scheduled events.

if you look at the above example you see that the amount of received and proceeded inbound packets could differ from the amount of execed scheduled events. with my patch the scheduler could hardly be late and also if handle_request_do needs long, further incoming packages are received just in time but maybe processed later.

- schmidts

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/970/#review2819
-----------------------------------------------------------

On 2010-10-09 16:31:06, schmidts wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/970/
> -----------------------------------------------------------
> 
> (Updated 2010-10-09 16:31:06)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> i need help cause i got a little stuck about this.
> with this patch the receiving and proceeding of sip packages over udp is split up and also in do_monitor the order how the ast_io_wait function is called.
> my idea is, if a slow handling in handle_request happens it would not block further incoming packets and scheduled events.
> 
> my problem is that the thread i have created which waits for requests to proceed creates a cpu load of 100% and i dont know why.
> i have tried using ast_cond_wait on this but i cant get this run, cause i cant hit the right locking order. :(
> 
> have i missed something essential here or why does this thread needs that much cpu time?
> 
> this patch is against 1.6.2 branch cause i need it for 1.6.2. if this works as i hope i moved it against trunk.
> 
> 
> Diffs
> -----
> 
>   branches/1.6.2/channels/chan_sip.c 291037 
> 
> Diff: https://reviewboard.asterisk.org/r/970/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> schmidts
> 
>