[asterisk-dev] [Code Review] Resolve overflow in time difference calculation that caused audio not to be recorded by MixMonitor

Fri Apr 27 10:50:25 CDT 2012

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1889/
-----------------------------------------------------------

Review request for Asterisk Developers and Joshua Colp.

Summary
-------

MixMonitor uses an audiohook that samples audio from both directions on a channel.  Such audiohooks retrieve samples of audio from two audio factories, a 'read' factory and a 'write' factory.  When an audio factory receives audio from a channel, it places the received audio samples on a queue, and updates the time it received the audio.  When the audiohook is later called by the MixMonitor thread (or whatever happens to care about the audio on the hook) to retrieve the audio, it checks the number of samples on the queue to see if the number of samples received is equal to the number of samples it expects to pull.  If, for a given factory, the number of samples is less then what is expected, the audiohook retrieval code checks the time that the audio was placed on the queue.  If the difference between the current time and that time is less then a particular delta (based on the number of samples we want to retrieve), the audiohook defers pulling the audio off the queues, noting that we checked the factories a little fast and that the next time through, the number of samples we want should be available.

In situations where both the read/write factories are providing data, this works just fine.

In situations where one of those factories is never updated (implying that the audio update time is 0), bad things happen on a 32-bit machine.  (This can happen in a variety of situations, but happens most frequently with ConfBridge, where the MixMonitor is placed on the Bridge channel)

The offending code is in ast_tvdiff_ms.  This method attempts to retrieve the difference, in milliseconds, between two timeval structs, and return the difference in a 64-bit integer.  Unfortunately, it assumes that the long tv_sec/tv_usec members in the timeval struct are large enough to hold the calculated values before it returns.  On 64-bit machines, this might be the case, as a long may be 64-bits.  On 32-bit machines, however, a long may be less (32-bits), in which case, the calculation can overflow.

As an example, consider a timeval struct containing 1335550904 as the value in tv_sec in the current time, and a value of 0 in tv_sec in the audio update time.  The calculation first subtracts 0 from 1335550904, then multiplies the result by 1000.  On a 32-bit machines, this results in a value of 1335550904000, which is greater then 2^31.  Hilarity ensues, as we now have a negative time difference.

This patch fixes the behavior by first casting the result of the difference to an int64_t before multiplying it by 1000.  This guarantees that the calculation is stored in a 64-bit quantity, preventing the overflow from happening.

(As an aside, this method is used in a lot of places in the code.  I'm not going to speculate on other failure scenarios.)

This addresses bugs ASTERISK-19426, ASTERISK-19471, ASTERISK-19497, and ASTERISK-19727.
    https://issues.asterisk.org/jira/browse/ASTERISK-19426
    https://issues.asterisk.org/jira/browse/ASTERISK-19471
    https://issues.asterisk.org/jira/browse/ASTERISK-19497
    https://issues.asterisk.org/jira/browse/ASTERISK-19727

Diffs
-----

  /branches/1.8/include/asterisk/time.h 363307 

Diff: https://reviewboard.asterisk.org/r/1889/diff

Testing
-------

Testing done by Ben Klang, who reported this issue on ASTERISK-19497.  Also reproduced the behavior on a 32-bit CentOS 6 machine, and resolved the issue using this patch.

Thanks,

Matt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20120427/1c24c67b/attachment.htm>