[asterisk-dev] chan_sip -- improving the speed of sip reloads
murf at digium.com
Fri Nov 16 09:21:46 CST 2007
Olle closed 11210, where the reporter has 5000 entries in his sip.conf file, and network processing is put on hold for over 40 seconds while a reload is being done, which causes some fairly significant problems.
This looked like a fairly juicy problem, so I started work on it, but about 1/3 of the way thru it, Olle closed that bug, preferring that the issue be taken to asterisk-dev as an architectural issue instead.
So, here I am. Here is what I've done, and what I'm planning to do; let me know if I'm going in the wrong direction.
Firstly, I plan to do the same sort of optimizations to sip.conf, that I did to the pbx core, and russell did to the iax stuff:
move from linked lists, to hash tables, by using the astobj2 facilities. I am about 1/2 of the way thru this. Most of the
work will be involved in those things that astobj provided, that astobj2 does not. (marking, etc)
Since astobj2 hash tables are not automatically resizeable, you need to come up with a good estimate of how many elements the hashtables will hold. Right now, it's set to 256. (smaller for low mem cases). If you have more than 512 elements into a 256-sized hash table, things will start to slow down noticeably. Since the number of elements is going to be site-specific, and if more than 256 is needed, users will have to mod the code to get them any bigger, I plan to add config file options, where the size of hash tables can be set there instead. This allows bigger sites to move up the size to do so easily, without source code modification.
I have already converted the registry list into a hash table, because it was in the ASTOBJ_ domain, but I plan to reverse this and put it back into ASTOBJ, as it never searches the list for a match by name. Since ASTOBJ is just linked lists, it occurred to me that AST_LIST might be more appropriate, but ASTOBJ does have the refcounts, and perhaps some nice iterator functionality that the registry is using, that make ASTOBJ a better alternative. BUT, ASTOBJ does not use read/write locks (although it appears to), and AST_LIST does... hmmm.
Josh suggested that if I move to ast_obj2, that I also put the dialog list (sip_pvt's) in a hash table, as it is searched by callid. I plan to do this also.
I noticed that chan_iax2 has several traversals of hash tables, but those in iax2_getpeername and iax2_getpeertrunk look as though a new hashtable, based on sin_addr and sin_port fields (ip addresses/ports) as key could speed up the searches. I don't know if these functions are called often or not... if they are, it might be worth the work to form, and maintain this additional hash table. The same situtation seems to exist in chan_sip, where traversals of peers (at least) are done, looking for a match to sin_addr and sin_port numbers. Such hash-table traversals in astobj2 make me nervous. Some time back, I reported that a 1% mix of traversals in a multi-threaded environment slowed astobj2 down by 15x. In other words, a benchmark that normally took 60 seconds to run, took 15 minutes to run when traversals of the entire hash table were done once every 100 operations. This behavior hasn't been corrected yet, as far as I know. So, my point here is that the fewer the traversals, the better. Also, astobj2 uses only mutex locks, which worries me. My hashtab implementation use read/write locks, and does not suffer the 15x slowdown with frequent (1 in 100 hashtab ops) traversals. I can't help wondering if astobj2 offered read/write locks, if this might improve this situation.... hmmm.
The above mods should speed up chan_sip in large cases, and will give general advantages, including higher call processing power, etc. -- but I'm mostly concerned with speeding up the reload process.
After the above optimizations are made, I have another set of optimizations to do... read on.
Josh tells me that config file processing is performed in the same thread that does the network processing. Getting config file processing into a different thread could help this, but if config file processing involves deleting everything and reforming everything again, then it may not help much, as the peers and users and registry lists will probably be locked up enough that running in another thread won't help much.
So, I plan to investigate and see if the approach taken by the pbx, in the merge_and_delete() function there, might apply to the sip environment. I suspect that in most reloads, only a small percentage of change exists. In another thread, we can read in the config file, and compare against the current situation, resulting in delete/add lists that can be swiftly applied to the current environment. In times past, I also did some work on merge_and_reload that saved the actual destruction of structures until after the reload was finished and the locks released, as that seemed to be a major part of the reload processing time. I threw this upgrade away, as the original reporter was too busy to test this, and I didn't want to commit it without testing. But I may just pull it back out and apply it anyway.
So, are there any objections to this approach? Now is the time to put in your thoughts!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the asterisk-dev