[asterisk-bugs] [JIRA] (ASTERISK-23319) Segmentation fault in queue_exec at app_queue.c

Matt Jordan (JIRA) noreply at issues.asterisk.org
Wed Mar 11 17:28:37 CDT 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-23319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=225343#comment-225343 ] 

Matt Jordan commented on ASTERISK-23319:
----------------------------------------

Dropping this back into Triage, as Stefan26 thinks he can reproduce the issue.

Note that I took a gander through {{app_queue}} with him, and there may be some paths where {{qe.pr}} goes garbage on us.

{quote}
{noformat}
(04:55:35 PM) Stefan26: Can leave_queue(qe) and queue_exec(chan, data) run concurrently (where qe->chan=chan)? It seems leave_queue(qe) can do a ast_free on my qe.pr
(04:59:22 PM) sruffell left the room (quit: Ping timeout: 265 seconds).
(05:00:39 PM) Stefan26: that ast_free is enclosed in a block with ao2_lock(q) {...} ao2_unlock(q); what is that meant to protect it from? i wonder if i should test putting a similar ao2_lock/ao2_unlock in queue_exec
(05:00:51 PM) Stefan26: but it's such a wild goose chase if i can't reproduce it with confidence
(05:02:00 PM) Stefan26: reproduce the crash*
(05:03:53 PM) superscrat [~asanders at 173-17-133-2.client.mchsi.com] entered the room.
(05:04:37 PM) sruffell [~sruffell at asterisk/the-kernel-guy/sruffell] entered the room.
(05:04:37 PM) mode (+o sruffell) by ChanServ
(05:04:49 PM) sruffell left the room (quit: Client Quit).
(05:07:12 PM) mjordan: Stefan26: so, qe.pr points to the first rule in qe.qe_rules
(05:07:30 PM) mjordan: which is a list allocated without a lock. That implies something else should provide the synchronization.
(05:08:31 PM) mjordan: leave_queue can be called in multiple locations.
(05:09:09 PM) mjordan: in wait_our_turn, queue_exec (which shouldn't be a problem), and try_calling
(05:10:17 PM) mjordan: since those in turn are called from queue_exec, my guess would be that there shouldn't need to be any synchronization.
(05:10:34 PM) mjordan: do you still have the core file from the crash?
(05:11:48 PM) mjordan: now...
(05:11:49 PM) mjordan: hm.
(05:12:01 PM) Stefan26: no but I can get it from a colleague tomorrow, only he managed to produce it. from my understanding the use case was that some kind of hang_up action was performed on the chan_sip channel which corresponded to the queue_ent qe for which the seg-fault happend
(05:12:39 PM) mjordan: One way this could go all nutso on us is if, somehow, leave_queue was called from try_calling/wait_our_turn, *AND* somehow we managed to loop back around in the queue such that the 'see if we need to move to the next penalty level' was called again
(05:13:58 PM) mjordan: Stefan26: which version of Asterisk?
(05:14:39 PM) Stefan26: Asterisk 13.1.0
(05:14:44 PM) Stefan26: but source from 13.1.0 looked similar
(05:14:45 PM) mjordan: so, it isn't impossible for try_calling to call leave_queue and return non-zero
(05:14:51 PM) Stefan26: 13.2.0*
(05:14:51 PM) mjordan: er, zero
(05:15:21 PM) mjordan: Assuming everything actually is 'answered', try_calling returns the value of ast_bridge_call_with_flags
(05:15:53 PM) mjordan: which returns 0 on success, -1 on failure
(05:16:50 PM) mjordan: that could cause a problem if it returns successfully and we don't actually leave the queue somehow.
(05:17:24 PM) mjordan: assuming the inbound channel never hung up.
(05:19:15 PM) mjordan: the same issue exists in wait_our_turn however
(05:19:36 PM) Stefan26: I can log (to file) all return values of res = try_calling(&qe, opts, opt_args, args.announceoverride, args.url, &tries, &noption, args.agi, args.macro, args.gosub, ringing) and check that log next crash, what else can I log to test your ideas?
(05:19:56 PM) mjordan: I would log out the return value of try_calling and wait_our_turn
(05:20:20 PM) Stefan26: and calls to leave_queue?
(05:20:22 PM) mjordan: yes
(05:20:39 PM) mjordan: I'm pretty sure that's the problem though.
(05:21:01 PM) mjordan: Both try_calling and wait_our_turn need to make sure that their callers treat a call to leave_queue as "get the hell out"
(05:21:17 PM) mjordan: to that point, leave_queue should also set qe->pr to NULL when done
(05:21:24 PM) mjordan: leaving it as garbage in this case is bad
(05:21:55 PM) [TK]D-Fender [~joe at 64.235.216.2] entered the room.
(05:23:07 PM) Stefan26: OK, ill log those items, if it ever crashes again, ill try patching qe->pr to NULL after next crash if log suggests it would have helped
(05:23:17 PM) putnopvut left the room (quit: Quit: Leaving).
(05:24:06 PM) Stefan26: I can't make a JIRA if the use-case is unknown?
(05:24:39 PM) mjordan: well, since we had the same crash elsewhere, we can just comment on that issue
(05:24:49 PM) mjordan: and re-open it
(05:24:56 PM) mjordan: of course, in that case, it was against 1.8
(05:25:15 PM) mjordan: so it may be worth looking at 11 to see if it has the same issues. I'd suspect that it does.
(05:25:34 PM) mjordan: it would be nice to know how it occurred before we try modifying app_queue: it has a tendency to be fragile :-)
(05:25:45 PM) Stefan26: If you re-open it I can post my crashlog tomorrow if collague gives it too much
(05:26:06 PM) Stefan26: I can't comment on a closed JIRA otherwise?
(05:26:24 PM) Stefan26: worst case scenario it never happens again and ull have to close it once more :)
(05:26:38 PM) pnlarsson [~niklas at fw1.gml.g.icnet.infracom.se] entered the room.
(05:26:42 PM) Stefan26: s/too much/to me/
{noformat}
{quote}

> Segmentation fault in queue_exec at app_queue.c
> -----------------------------------------------
>
>                 Key: ASTERISK-23319
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-23319
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_queue
>    Affects Versions: 1.8.25.0
>         Environment: CentoOS 6.2
>            Reporter: Vadim
>         Attachments: trace.txt
>
>
> gdb /usr/sbin/asterisk /tmp/coreXX.XXX
> ....
> [Edit by Rusty Newton - removed inline debug as per the guidelines...]



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list