<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Over the weekend, we had several customers running at AWS. AWS had an outage during this time.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This customer is running Asterisk 16.23.0 (which has the STUN timeout crash fix).<o:p></o:p></p>
<p class="MsoNormal">From what I have been told, other customers are running newer Asterisk 18.12.1 but encountered similar issues. (I haven’t had a chance to verify this)<o:p></o:p></p>
<p class="MsoNormal">All these customers should be running PJSIP, but I haven’t had a chance to verify.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The logs show Asterisk was reporting problems communicating with the STUN address in the rtp.conf<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">[02/04 00:15:03.812] NOTICE[5943] stun.c: Attempt 1 to send STUN request to 'x.x.x.x' timed out.<o:p></o:p></p>
<p class="MsoNormal">[02/04 00:15:06.812] NOTICE[5943] stun.c: Attempt 2 to send STUN request to ''x.x.x.x ' timed out.<o:p></o:p></p>
<p class="MsoNormal">[02/04 00:15:09.813] WARNING[5943] stun.c: Attempt 3 to send STUN request to 'x.x.x.x' timed out. Check that the server address is correct and reachable.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Until Asterisk was reset, the same pattern kept happening.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Asterisk received INVITEs<o:p></o:p></p>
<p class="MsoNormal">Immediately sends the 100 Trying<o:p></o:p></p>
<p class="MsoNormal">7 seconds later, Asterisk receives a CANCEL from the SIP provider.<o:p></o:p></p>
<p class="MsoNormal">Another half second later, Asterisk receives a second CANCEL<o:p></o:p></p>
<p class="MsoNormal">A second later, Asterisk receives a third CANCEL<o:p></o:p></p>
<p class="MsoNormal">After the third failed to send STUN request, Asterisk sends a 200 OK response for the CSeq CANCEL<o:p></o:p></p>
<p class="MsoNormal">Followed by a 487 Request Terminated<o:p></o:p></p>
<p class="MsoNormal">Then a second 200 OK response for the CANCEL CSeq<o:p></o:p></p>
<p class="MsoNormal">Then a third 200 OK response for the CANCEL CSeq<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We have an AMI connection. At this point, we are seeing the Newchannel event for this channel.<o:p></o:p></p>
<p class="MsoNormal">It immediately sends various events for the Channel, including the Event: Hangup indicating the channel is ended.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">63 ms later, it receives an ACK which completes the Call-ID processing.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This went on for over 8 hours.<o:p></o:p></p>
<p class="MsoNormal">When they restarted the Asterisk box, everything was fine. I have been told, they had to restart each Asterisk we had running at AWS to resolve the failed to send to STUN error. No calls/channels would work until that was resolved.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I wonder if the STUN address lookup happens only one time and AWS DNS may have modified something during this outage/recovery?<o:p></o:p></p>
<p class="MsoNormal">Is there a recommendation on how to prevent this from happening?<o:p></o:p></p>
<p class="MsoNormal">Any thoughts?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Dan<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>