[Asterisk-Dev] Voice energy detection: coder wanted

Sat Nov 8 12:49:56 MST 2003

Hi,
I hope that you can use a couple of hints from this. I have been working 
for a global carrier
in more then 10 years and part of the job was to do 'soft-answer 
supervision' and call progress
detection. We had equipment in more than 90 countries so I have seen 
quite a few different way of
doing things.
It was my experience that an answering machine would always trip a 
simple 'voice energy/cadance detection'
 since it's actually voice that you have recorded. The answer detection 
we used was simply based upon the fact
that a 'conversation' would normally be 'bi-directional'. So our 
'answer-detector' was actually 2 VAD's
monitored by a 'speech-direction' detector. In order to say that speech 
direction was from A to B the
VAD 'from A' should say 'speech present' while the VAD 'from B' should 
say 'no speech'.
Our criteria was typically set to '3 direction shifts within 30 seconds' 
for installations in US and Europe.
I do still have pattern matching call progress info for most the 
countries we worked in if this stuff still
has someones interest.
b.r.
Freddi

>Message: 8
>Date: Sat, 8 Nov 2003 06:09:39 +0000 (GMT)
>From: Stephen Davies <steve at daviesfam.org>
>To: asterisk-dev at lists.digium.com
>Subject: Re: [Asterisk-Dev] Voice energy detection: coder wanted
>Reply-To: asterisk-dev at lists.digium.com
>
>
>
>On Fri, 7 Nov 2003, John Todd wrote:
>
>  
>
>>I have a requirement from one of my customers (in the emergency 
>>services arena, I am told) to develop a voice energy detection system 
>>for Asterisk.  This would be to detect the difference between an 
>>answering machine, and a human.  This detection need only be very 
>>basic, and probably will hook into the existing routines in dsp.c 
>>(unless you have a cadence and tonal module already built.)
>>    
>>
>
>So I'm curious as to the algorithms used.  All I can think of is that
>an answering machine talks for longer than a real human caller.
>
>dsp.c can already detect voice as opposed to various tones.  So
>wouldn't answering machine detection go something like:
>
>if <start detecting voice for the first time>
>  note that the call is answered
>  if you don't hear say 1sec silence within 5 secs then
>    note that it was probably an answering machine
>
>I'd allow the possibility that people talk "differently" when
>recording an announcement - ie in "posh telephone voice" which perhaps
>has a different spectrum to their usual voice - but seeing you dont
>know their usual voice I'm not sure how you could use that.
>
>Steve
>
>
>
>
>  
>