[Asterisk-Dev] Voice energy detection: coder wanted

Sat Nov 8 21:19:54 MST 2003

Freddi -
   Thanks for the data.  However, your experiences are with detecting 
two-way conversations, where you are attempting to determine if the 
two legs of a call have humans attached to them.  My customer's 
problem is only one-half of that issue, which is that the system 
needs to detect if a human is at one end of the call (versus an 
answering machine) which is a bit more difficult.  Do your pattern 
matching archives cover any such instances for reliable detection?

JT

>Hi,
>I hope that you can use a couple of hints from this. I have been 
>working for a global carrier
>in more then 10 years and part of the job was to do 'soft-answer 
>supervision' and call progress
>detection. We had equipment in more than 90 countries so I have seen 
>quite a few different way of
>doing things.
>It was my experience that an answering machine would always trip a 
>simple 'voice energy/cadance detection'
>since it's actually voice that you have recorded. The answer 
>detection we used was simply based upon the fact
>that a 'conversation' would normally be 'bi-directional'. So our 
>'answer-detector' was actually 2 VAD's
>monitored by a 'speech-direction' detector. In order to say that 
>speech direction was from A to B the
>VAD 'from A' should say 'speech present' while the VAD 'from B' 
>should say 'no speech'.
>Our criteria was typically set to '3 direction shifts within 30 
>seconds' for installations in US and Europe.
>I do still have pattern matching call progress info for most the 
>countries we worked in if this stuff still
>has someones interest.
>b.r.
>Freddi
>
>>Message: 8
>>Date: Sat, 8 Nov 2003 06:09:39 +0000 (GMT)
>>From: Stephen Davies <steve at daviesfam.org>
>>To: asterisk-dev at lists.digium.com
>>Subject: Re: [Asterisk-Dev] Voice energy detection: coder wanted
>>Reply-To: asterisk-dev at lists.digium.com
>>
>>On Fri, 7 Nov 2003, John Todd wrote:
>>
>>
>>>I have a requirement from one of my customers (in the emergency 
>>>services arena, I am told) to develop a voice energy detection 
>>>system for Asterisk.  This would be to detect the difference 
>>>between an answering machine, and a human.  This detection need 
>>>only be very basic, and probably will hook into the existing 
>>>routines in dsp.c (unless you have a cadence and tonal module 
>>>already built.)
>>>   
>>>
>>
>>So I'm curious as to the algorithms used.  All I can think of is that
>>an answering machine talks for longer than a real human caller.
>>
>>dsp.c can already detect voice as opposed to various tones.  So
>>wouldn't answering machine detection go something like:
>>
>>if <start detecting voice for the first time>
>>  note that the call is answered
>>  if you don't hear say 1sec silence within 5 secs then
>>    note that it was probably an answering machine
>>
>>I'd allow the possibility that people talk "differently" when
>>recording an announcement - ie in "posh telephone voice" which perhaps
>>has a different spectrum to their usual voice - but seeing you dont
>>know their usual voice I'm not sure how you could use that.
>>
>>Steve