[asterisk-dev] What happened with the latest round of releases: or, "whoops"

Russell Bryant russell at russellbryant.net
Fri Jun 13 10:29:28 CDT 2014


On Fri, Jun 13, 2014 at 10:42 AM, Matthew Jordan <mjordan at digium.com> wrote:

>
>
>
> On Fri, Jun 13, 2014 at 4:41 AM, Steven Howes <steve-lists at geekinter.net>
> wrote:
>
>> On 13 Jun 2014, at 08:12, Matthew Jordan <mjordan at digium.com> wrote:
>> > Apologies if this e-mail gets a bit rambling; by the time I send this
>> it will be past 2 AM here in the US and we've been scrambling to fix the
>> regression caused by r415972 without reintroducing the vulnerability it
>> fixed for the past 9 hours or so.
>> >
>> > Clearly, there are things we should have done better to catch this
>> before the security releases went out yesterday. The regression was serious
>> enough that plenty of tests in the Test Suite caught the error - in fact,
>> development of a test on a local dev machine was how we discovered that the
>> regression had occurred.
>>
>>
These are *really* hard problems.  I think the responsiveness to the
regression and openness to do a retrospective is awesome, though.  Asterisk
is not the only open source project that may struggle with the security
release process.  When all of your processes and tooling is built around
being as open as possible, having the occasional patch that has to be
handled in a non-open way is a giant pain.  As another example, we struggle
with some of this in OpenStack.  We rely heavily on our public code review
and CI systems.  Security patches have to get reviewed on a private bug,
which is a pain, and we can't run CI on the patches in advance (we normally
do for every revision of every other patch, before commit).  So, we have to
rely on manual local testing, and it bites us occasionally.  Anyway ...
point is, this is hard.

Overall, I think Asterisk does a very good job with security issues.
 Issues are fixed and released promptly with a proper CVE and detailed
security advisory.  So, despite a slip up here, I still think Asterisk does
an amazing job overall.  Nice work.


>  I’ve not been directly involved with the whole commit/testing procedure,
>> so excuse me if I’m misreading anything..
>>
>> If it fails the tests, how was it released? I understand the whole
>> reduced transparency/communications thing, it’s an unfortunate necessity of
>> dealing with security issues. I can’t see how that excludes the testing
>> carried out by the Test Suite though?
>>
>> Kind regards,
>>
>>
> Disregarding local test suite runs, a few things happened here:
>

Yeah, running tests locally in advance seems like the big most obvious
issue, but I know that can be a pain when you're used to relying on CI to
do this for you.


> (1) Four security patches were made at roughly the same time.
> Unfortunately, the patch with the issue was the last one to get committed -
> and by the time that occurred, there were a large number of jobs scheduled
> in front of it.
>
> (2) The order of execution of jobs in Bamboo is the following:
>      (a) Basic build (simple compile test) on first available build agent
> =>
>      (b) Full build (multiple compile options, e.g., parallel builds) on
> all different flavors of build agent =>
>      (c) Unit test run =>
>      (d) Channel driver tests in the Test Suite =>
>      (e) ARI tests in the Test Suite
>     Nightly, a full run of the test suite takes place.
>
>     This issue would have been caught by step (d) - but each of the
> previous steps takes awhile to complete (Asterisk doesn't compile quickly).
> A test suite run takes a long time - even with the reduced sets of tests in
> steps (d) and (e). Each merge in a branch causes this process to kick off -
> and there were at least 7 iterations of this in front of it. Which leads to
> point #3:
>

Any thoughts on speeding up the test suite?  It looks like you're already
doing separate runs of subsets of the test suite.  If those are being run
in parallel, that seems like the best bang for buck improvement.  A longer
term thing would be to rework all tests to work in such a way that they can
be run in parallel on the same host.


> (3) The merge process on the offending patch was slowed down due to merge
> conflicts between branches. The merging of the patch into all branches
> wasn't complete until nearly 3 PM, which meant we had very little time to
> get the releases out - generally, we strive hard to get the security
> releases out the door as early as possible, so system administrators have
> time that day to upgrade their systems if they are affected.
>

Dealing with merge conflicts could be mitigated by preparing all of the
patches in advance, right?  You may still have some issues depending on
what has merged in the last day or so, but it seems like porting the
patches in advance is pretty important.  In addition to dealing with
conflicts without so much time pressure, it allows allows careful code
review of each patch being released and of course, running tests on them in
advance.


> All of that aside, there's a few things (again, beyond running the test
> suite locally) that could be done to improve the situation:
>
> (a) Add a 'smoke test' to the Test Suite that gets run either in the Basic
> Build or Full Build steps. This would do some very simple things: originate
> a call over AMI with a Local channel, use a SIP channel to connect to
> another instance of Asterisk, pass media/DTMF, bounce back to the test
> using AGI, and maybe a few other things. Such a test could hit a lot of our
> normal 'hot spots' and - if run early enough in the cycle - would flag
> developers quicker than the current process.
>
> (b) Throw some more hardware at the problem. Right now, we have a single
> 32-bit/64-bit CentOS 6 machine - we could easily double that up, which
> would get results faster.
>
>
Interested in resource donations here?  I could provide a VM in a public
cloud.  Further, I'm pretty sure Rackspace offers some free resources to
open source projects.  You could use that for CI capacity. [1]

[1] https://twitter.com/jessenoller/status/355453772803211264

-- 
Russell Bryant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20140613/5d39a68d/attachment-0001.html>


More information about the asterisk-dev mailing list