[Asterisk-Users] Answering Machine Detection

Wed Oct 29 08:41:21 MST 2003

Hi Chris,

That is exactly the Dialogic implementation I was referring to that was 
utterly useless. It works OK when people are demoing, as they always 
follow a certain pattern. In real like it I've always found it a recipe 
for screaming angry users. Depnding on your use it can get over 90% of 
calls wrong. It is just so dependant on exactly how people behave.

Regards,
Steve

Chris Ziomkowski wrote:

> Actually,
>
> Back in '99, Dialogic used a very simple algorithm, and it was 
> surprisingly accurate. You simply watch and see how long the initial 
> greeting is. If it is short (say, only a few seconds), then it is 
> generally a live person. However, if the initial greeting lasts for 
> much longer (say 20 seconds) then you have contacted an answering 
> machine.
>
> That is one of the big reasons CPA on Dialogic used to give so many 
> headaches on drop and insert applications. It would sometimes wait 10 
> seconds before returning answer supervision to the application and the 
> talk path would be cut through (Had to wait to determine whether it 
> was a human or an answering machine). In this time, if a human 
> answered, he would sometimes hangup because he wouldn't hear any 
> response from the remote side.
>
> Properly tuned, just watching how many seconds of energy you get in 
> the initial greeting before silence sets in will give you 90% accuracy 
> in determining answering machine or live person. There are always 
> exceptions however. As a first guess though, you can assume anything 
> less than 5-10 seconds is human, anything greater is a machine.
>
> Lots of ways to get it wrong though. Not recognizing a SIT tone and 
> returning "answering machine" for circuit failure, not recognizing 
> when ringing has ended and misinterpreting the "hello....hello" as 
> still being ringing cadence (Dialogic did this about 3% of the time). 
> But in theory it should be trivial to implement in Asterisk. Might 
> want to write a new "energy detector" algorithm in dsp.c though based 
> on a wideband/low Q resonator approach (move the pole way in towards 
> the origin) as opposed to narrow band goertzels (pole on the unit 
> circle). More robust for this type of work.
>
> Chris
>
> At 08:24 PM 10/29/2003 +0800, you wrote:
>
>> Alastair Maw wrote:
>>
>>> On 27/10/03 21:57, DUSTIN WILDES wrote:
>>>
>>>> Does anyone have any recommendations on implementing Answering
>>>> Machine detection for call generation programs?
>>>
>>>
>>>
>>> There's obviously no nice way of doing this.
>>> If you're doing telemarketing, and you're playing pre-recorded 
>>> audio, which of course is a nasty thing to do, the algorithm is 
>>> something like:
>>>
>>> 1. Dial out.
>>> 2. Wait for answer.
>>> 3. Start playing audio.
>>> 4. If you hear something that sounds like a beep, either hang up
>>>    and try again later, or stop the audio, pause for two seconds
>>>    and start playing it again.
>>> 5. Hang up when finished playing audio.
>>>
>>> Step 4 is accomplished by doing a FFT on the incoming audio into 
>>> frequency buckets and taking a rolling average of the mean and 
>>> standard deviation, such that you can detect when a fixed monotone 
>>> beep occurs at the other end.
>>
>>
>> How very inefficient. Looking for peaks in the autocorrelation 
>> function requires much less compute.
>>
>>> If you don't want to play audio files and wait for beeps, and want 
>>> to connect real humans to each other, then there's no decent way to 
>>> do this, as the only difference between humans and arbitrary 
>>> answering machines is that the answering machines give you a beep 
>>> prompt to record your message.
>>
>>
>> Right. Dialogic and others make a big fuss of the super detection 
>> algorithms, and quote 90+% accuracy. In the real world they are 
>> utterly useless. Call answering just doesn't fall into a sufficient 
>> redular patterm.
>>
>> Regards,
>> Steve
>