[asterisk-bugs] [JIRA] (ASTERISK-26941) ARI WebSocket forcibly closed due to fatal write error on repeated mute/unmute requests

Colin (JIRA) noreply at issues.asterisk.org
Sun Apr 30 05:03:57 CDT 2017


    [ https://issues.asterisk.org/jira/browse/ASTERISK-26941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=236799#comment-236799 ] 

Colin commented on ASTERISK-26941:
----------------------------------

We've spent some time experimenting. Increasing the timeout seems to be effective - we can mute and unmute 60 callers without a crash if we increase the ARI WebSocket timeout to 1 second - but only when we don't write channel variables. In our app we write additional state back to channel variables, so in the case of a disconnection of the app from Stasis, our app can reconnect and 'audit' the channels in Asterisk by reading the additional state, and thus dynamically restore functionality. So, for example, if we mute a channel, we write a variable to that channel with some additional service information. This is read on reconnection, or on the start of a new instance of our app to resume the service. When we mute 60 channels, in bridges, we write 60 channel variables, and this crashes Stasis. We're looking at working around this by using a different caching mechanism - for example a Redis cloud cache. 

It seems that bulk serial writing of channel variables following a bulk channel operation (mute, move between bridges) is the cause of the crash. When we write the channel variables, we're still seeing the pattern in (1), (2) and (3) above - a large number of WebSocket writes on the Asterisk side, followed by a WebSocket fail in websockets.c, followed by the deactivation of the Stasis app. We are then entirely unable to reconnect via REST or WebSocket without restarting the entire Asterisk instance. Given that Asterisk is still functioning, and we have live callers, this is not workable - our aim to reconnect, not to restart. Is this normal behaviour - i.e. is there any way we can avoid the Stasis application becoming moribund if the WebSocket faults in websockets.c? The question here is that in the case of a WebSocket fault, our application will attempt to reconnect at 3-second intervals. How does ARI behave?

Finally, some of the ARI event messages we receive back when we carry out bridge operations are extremely verbose. "ChannelEnteredBridge", for example sends the Channel IDs of all other channels present in the bridge. When we are moving 30 calls between bridges, each new event sends effectively its own ID, and all the others that are already there - so by the thirtieth message we have a lot of superfluous information. Is there any way we can 'unregister' for events from Stasis? We've looked at 'decline' in stasis.conf, but it doesn't seem granular enough to correspond with the events we are receiving.


> ARI WebSocket forcibly closed due to fatal write error on repeated mute/unmute requests
> ---------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-26941
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26941
>             Project: Asterisk
>          Issue Type: New Feature
>      Security Level: None
>          Components: Resources/res_http_websocket
>    Affects Versions: 14.3.0
>         Environment: Ubuntu 16.04.2 LTS on Windows Azure VM
>            Reporter: Colin
>            Assignee: Unassigned
>
> We're testing a Stasis conference application and seeing forced WebSocket closes from res_http_websocket.c. Testing with SIPp, we're calling mute/unmute asynchronously  for batches of callers (POST /channels/\{channelId}/mute). 
> We're running with about 60 SIPp connections. Other batch operations (for example move callers between bridges) complete correctly. This issue seems to be specific to mute and unmute commands. 
> Asterisk does not crash, but the Stasis app effectively becomes moribund. We're muting and unmuting callers as one group, using async tasks which we marshal before any further requests are sent to ARI. 
> We set up a test server to get some forensic information, using SIPp to simulate 60 connections. We are finding that about 10-15 successful mute/unmute operations take place from each group, before the WebSocket and HTTP connections become moribund. I can provide a debug log file. Some salient events that we've picked out when we do a group mute/unmute, however, are:
> 1) A large number of webocket writes (in the order of n x n, where n is the number of clients being muted/unmuted), like:
> {noformat}
> [Apr 10 14:36:29] DEBUG[35342] taskprocessor.c: Taskprocessor 'subm:voice_2-00000065' cleared the high water alert.
> [Apr 10 14:36:29] DEBUG[35342] res_http_websocket.c: Writing websocket string of length 668
> [Apr 10 14:36:29] DEBUG[35342] res_http_websocket.c: Writing websocket text frame, length 668
> {noformat}
> 2) After a small number of successful mute/unmute operations, the websocket faults, and closes
> {noformat}
> Apr 10 14:36:30] DEBUG[35342] res_http_websocket.c: Closing WS with 1011 because we can't fulfill a write request
> [Apr 10 14:36:30] DEBUG[35342] utils.c: Timed out trying to write 
> [Apr 10 14:36:30] DEBUG[35342] res_http_websocket.c: WebSocket connection from 'xxx.xxx.xxx.xxx:18133' forcefully closed due to fatal write error 
> {noformat}
> 3) A large number of attempted writes, followed by deactivation of the Stasis application
> {noformat}
> [Apr 10 14:36:30] DEBUG[35342] res_http_websocket.c: Writing websocket text frame, length 668
> [Apr 10 14:36:30] NOTICE[35342] ari/ari_websockets.c: Problem occurred during websocket write to :18133, websocket closed
> [Apr 10 14:36:30] WARNING[35340] ari/ari_websockets.c: WebSocket read error: Software caused connection abort
> [Apr 10 14:36:30] VERBOSE[35340] stasis/app.c: Deactivating Stasis app 'voice_2'
> {noformat}
> We can replicate this at will, and we have extensive verbose/debug logs if these are needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list