<br><br><div><span class="gmail_quote">On 11/17/07, <b class="gmail_sendername">Matthew Rubenstein</b> <<a href="mailto:email@mattruby.com">email@mattruby.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Ultimately a standard test suite like the one so helpfully published in<br>this thread would benchmark the baseline config, then the transcoding<br>config, as this one did. But for each of the combinations of the various
<br>codecs. Then a benchmark adding each of various options, like<br>conferencing, recording, etc. That grid would be repeated on each of<br>directly comparable HW configs, like a single CPU with single core at<br>x-GHz, multiple of those CPUs, the same benchmarks repeated for
<br>single/multiple multicore CPUs, each at increasing GHz.<br><br> That 3D stack of benchmark data could be crunched to find the linear<br>(or other simple formula) ranges for scaling Asterisk capacity for</blockquote>
<div><br><br>it will not scale linearly though, it gets slower when you start to get a higher contention rate of threads wanting to run vs total number of available cpus. Then its also not just a function of clockspeed, one thing the woodcrest has in its favour, which the 5140 is, is that it can do more in the same clockspeed than other xeons. So instruction optimization, cache (and is the cache warm or not?), branch prediction failures (generally a compiler issue), as well as everything else influence how much it can do in a given clock cycle.
<br><br>Then you have other issues, ALOC being high or low, low ALOCs cause more call setup/tear down requests, and are more common with one type of traffic than another. Once you measure all the things that make up the base then you can start to do a per feature request, but again a conference with 2 people will perform differently than one with 20, which will perform differently based on which of the 3 major conferencing modules you use.
<br><br>But yes, if the base is done in a good way, with a wide range of criteria you can see what the cost is for a base, then by taking fewer samples for new features you can try to extrapolate based on that base. For example, if you did 1,2,4,8,16,32,64 cps tests as a base and rated performance, then did 1,8,64 cps tests for a specific module, lets say conferencing where you tossed 2 in there one time, 8 another and maybe 30 for a final, you could start to see the relative impacts.
<br><br>But then again its never that simple, but it could give you a much better feel, although its a much more involved test. <br><br>Since the criteria is public on what was done, if others were to perform similar testing then slow spots could be identified as well as some better quality metrics. Although I have commented on what I would like to see in addition, such as 1 or more programs monitoring rtp for quality issues (possibly via port replication, with multiple tools you are more likely to have a way to average the results since a missed packet may be the switch not replicating, the program not catching it or whatever).
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br> That's the kind of benchmarking that I'd expect Digium to do. They
<br>probably have already done at least a limited subset, but haven't</blockquote><div><br>it would certainly help them identify where the code is a little slow, so they can work on that. Profiling is good, but sometimes you also need to try to use it in more real world situations to identify what exactly is going on that is causing problems. 1 call can work for profiling, but 1000 can show lock contention and other issues that a single call would never reveal.
<br><br></div></div><br>-- <br>Trixter <a href="http://www.0xdecafbad.com">http://www.0xdecafbad.com</a> Bret McDanel<br>Belfast +44 28 9099 6461 US +1 516 687 5200<br><a href="http://www.trxtel.com">http://www.trxtel.com
</a> the phone company that pays you!