[asterisk-dev] Asterisk goes Spatial Conferencing: the STEAK project

Fri Aug 5 10:19:35 CDT 2016

Hi Dennis:

Apologies for not responding quicker to your e-mail - I've been on
vacation the last week and a half. Some comments inline below.

On Sat, Jul 23, 2016 at 3:50 AM, Dennis Guse
<dennis.guse at alumni.tu-berlin.de> wrote:
> Then lets get started.
> Due to the relatively large number of changes, we will split them up
> into individual patches:

Individual patches are great! Smaller changes are always easier to review.

> 1) Support for interleaved stereo (mainly softmix and struct channel)
> 2) Extension of confbridge with binaural synthesis via convolution (if
> channel supports stereo)
>
> For the patches, we will remove the hard dependency to OPUS (although
> we will stick to using it) and also enable L16 with stereo.
>
> Nevertheless, there are still some open questions:
>
> 1. Storage and description of HRTFs
> Impulse responses are at the moment compiled into Asterisk as a header file.
> This header file is generated using a custom C-program converting a
> multi-channel wave file into a float-array - hrirs_fabian.wav was
> taken from the SoundScapeRenderer
> https://github.com/SoundScapeRenderer/ssr/tree/master/data/impulse_responses/hrirs
>
> For positioning a sound source, the HRTFs for the left and for the
> right ear need to be selected according to the desired angle.
> This information is at the moment hard-coded as we use a 720-channel
> wave (interleaved: left, right) to cover 360 degrees completely.
>
> Would you prefer to compile the HRTFs into Asterisk (incl. the
> hard-coded description) or rather make this configurable?
> For the second option, we would need some support in terms of how to
> add configuration options.

For an initial implementation, I would recommend hard-coding it. If a
different data set is needed for different listeners, that would be an
improvement that someone could make at a latter time.

> 2. Configuration of positioning of conference participants
> The positioning of individual participants is at the moment hard-coded
> (compiled via header-file).
> This is basically an array containing the angles at which participant
> _n_ is going to be seated.
> This could also be made configurable via configuration file.
>
> Furthermore, all participants of a conference receive the _same_
> acoustical environment (i.e., participant 1 always sits in front of
> the listener etc.).
> This  limits the computational requirements while individual listeners
> cannot configure their desired seating order.
> In fact, the own signal of a participant is subtracted after rendering
> the whole environment before sending the signals back.

Again, I would punt on making this configurable. If the default
experience is "good enough", effort put into making this configurable
will be wasted. If it is not "good enough", then effort could be
expended into making it configurable.

> 3. Internal sampling rate of Confbridge
> The binaural synthesis is at the moment conducted at 48kHz.
> This is actually due to our use of OPUS, which always uses 48kHz for
> the decoded signals.
> Is this ok?

I'm not sure I have a good answer to this. I think it depends on what
the experience is like if a lower sampling rate is used. What happens
if the sampling rate used in the conference is 8kHz or 16kHz?

> 4. Is the dependency to libfftw3 an issue?

libfftw3 if GPLv2, so no, it should not be an issue.

https://github.com/FFTW/fftw3/blob/master/COPYING

That being said, it should not be a hard dependency. If libfftw3 is
not installed, the feature should simply not be available.

-- 
Matthew Jordan
Digium, Inc. | CTO
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org