[Asterisk-Users] Answering Machine Detection

Wed Oct 29 06:18:08 MST 2003

Actually,

Back in '99, Dialogic used a very simple algorithm, and it was surprisingly 
accurate. You simply watch and see how long the initial greeting is. If it 
is short (say, only a few seconds), then it is generally a live person. 
However, if the initial greeting lasts for much longer (say 20 seconds) 
then you have contacted an answering machine.

That is one of the big reasons CPA on Dialogic used to give so many 
headaches on drop and insert applications. It would sometimes wait 10 
seconds before returning answer supervision to the application and the talk 
path would be cut through (Had to wait to determine whether it was a human 
or an answering machine). In this time, if a human answered, he would 
sometimes hangup because he wouldn't hear any response from the remote side.

Properly tuned, just watching how many seconds of energy you get in the 
initial greeting before silence sets in will give you 90% accuracy in 
determining answering machine or live person. There are always exceptions 
however. As a first guess though, you can assume anything less than 5-10 
seconds is human, anything greater is a machine.

Lots of ways to get it wrong though. Not recognizing a SIT tone and 
returning "answering machine" for circuit failure, not recognizing when 
ringing has ended and misinterpreting the "hello....hello" as still being 
ringing cadence (Dialogic did this about 3% of the time). But in theory it 
should be trivial to implement in Asterisk. Might want to write a new 
"energy detector" algorithm in dsp.c though based on a wideband/low Q 
resonator approach (move the pole way in towards the origin) as opposed to 
narrow band goertzels (pole on the unit circle). More robust for this type 
of work.

Chris

At 08:24 PM 10/29/2003 +0800, you wrote:
>Alastair Maw wrote:
>
>>On 27/10/03 21:57, DUSTIN WILDES wrote:
>>
>>>Does anyone have any recommendations on implementing Answering
>>>Machine detection for call generation programs?
>>
>>
>>There's obviously no nice way of doing this.
>>If you're doing telemarketing, and you're playing pre-recorded audio, 
>>which of course is a nasty thing to do, the algorithm is something like:
>>
>>1. Dial out.
>>2. Wait for answer.
>>3. Start playing audio.
>>4. If you hear something that sounds like a beep, either hang up
>>    and try again later, or stop the audio, pause for two seconds
>>    and start playing it again.
>>5. Hang up when finished playing audio.
>>
>>Step 4 is accomplished by doing a FFT on the incoming audio into 
>>frequency buckets and taking a rolling average of the mean and standard 
>>deviation, such that you can detect when a fixed monotone beep occurs at 
>>the other end.
>
>How very inefficient. Looking for peaks in the autocorrelation function 
>requires much less compute.
>
>>If you don't want to play audio files and wait for beeps, and want to 
>>connect real humans to each other, then there's no decent way to do this, 
>>as the only difference between humans and arbitrary answering machines is 
>>that the answering machines give you a beep prompt to record your message.
>
>Right. Dialogic and others make a big fuss of the super detection 
>algorithms, and quote 90+% accuracy. In the real world they are utterly 
>useless. Call answering just doesn't fall into a sufficient redular patterm.
>
>Regards,
>Steve
>
>
>_______________________________________________
>Asterisk-Users mailing list
>Asterisk-Users at lists.digium.com
>http://lists.digium.com/mailman/listinfo/asterisk-users