[asterisk-dev] Scheduling Issues in 1.6.2 and also 1.8

Wed Jul 28 14:49:49 CDT 2010

Hello,

cause of my issue (https://issues.asterisk.org/view.php?id=17699) i´ve 
done deeper investigation in the new heap schedule and found 2 points to 
make asterisk perform better. i still have not found a solution for my 
problem but one simple patch would prevent sending too many sip (notify) 
packets.

the biggest improvment in my opinion is in chan_sip.c in the 
sip_poke_noanswer function 
(http://www.asterisk.org/doxygen/trunk/chan__sip_8c-source.html#l24429) 
when a poke of a peer doesnt answer after its max time (lastms*2 or 
maxms) the function is backtraced and do the following:

this row:

24453 ast_devstate_changed 
<http://www.asterisk.org/doxygen/trunk/devicestate_8h.html#c91fd8b93a2095c18ab86bd61c19ae46>(AST_DEVICE_UNKNOWN 
<http://www.asterisk.org/doxygen/trunk/devicestate_8h.html#f72f8a5c0601e4c1f8a852749cd44eee41983b6678528bfcdf919b24d6e08d9a>, 
"SIP/%s", peer->name);
24452 peer->lastms = -1;

fires every time this function is backtraced what happens in a normal 
config every 14 seconds. (maxms 4 sec and 10 sec for DEFAULT_FREQ_NOTOK)

my little change is just a

24452 if(peer->lastms > -1)
24453 ast_devstate_changed(AST_DEVICE_UNKNOWN, "SIP/%s", peer->name);
24454 peer->lastms = -1;

this maybe could be done in the if some rows above, but iam not sure 
what happens to the dialog_unlink function in between.

the devicestade handling which include a list lock and sending notifys 
is only done once, when the peer gets unreachable not every 14 seconds 
again, so there will be a smaller amount of scheds on the heap.

the other issue is the scheduler runq function and that it could happens 
that this function starts to much events cause the time range is 
calculated on each loop. if an event takes too much time (more than 1 
ms), its possible that the next event is allready scheduled and so it 
will be started, without checking the incoming FD meanwhile.
in my tests i´ve seen an amount of around 1500 events in one single run 
which should not run longer than one ms.
my patch just calculate the time range before doing the for loop, so it 
will goes back into the scheduler function, poll the incoming FD with 
timeout 0 and starts the runq again.

should i open an issue for every patch i´ve made or is it better to do 
this at once?

best regards

steve smith