[Asterisk-Dev] Deadlock/mutex conflicts in chan_agent.c

steve at daviesfam.org steve at daviesfam.org
Sat Jan 1 05:09:33 MST 2005


Hi,

I've started trying to use Agents with CVS-HEAD, and I'm having problems 
with mutex deadlocks between the "AgentLogin" side and the 
"Queue" side.

All the mutex twiddling in chan_agent.c is complicated, so its hard for me
to tell what is supposed to happen and how best to fix it.

Anyway - here's an explanation of the deadlock:


In __login_exec, there's a lot of locking and unlocking of &agentlock and 
the &p->lock for the particular agent_pvt structure.

This code locks &agentlock first and then the agent_pvt structure.

In the middle is the following:

                        ast_mutex_lock(&agentlock);
                        ast_mutex_lock(&p->lock);
                        if (p->lastdisc.tv_sec) {
                                gettimeofday(&tv, NULL);
                                if ((tv.tv_sec - p->lastdisc.tv_sec) * 1000 +
                                        (tv.tv_usec - p->lastdisc.tv_usec) / 1000 > p->wrapuptime) {
                                                ast_log(LOG_DEBUG, "Wrapup time expired!\n");
                                        memset(&p->lastdisc, 0, sizeof(p->lastdisc));
                                        if (p->ackcall > 1)
                                                check_beep(p, 0);
                                        else
                                                check_availability(p, 0);
                                }
                        }
                        ast_mutex_unlock(&p->lock);
                        ast_mutex_unlock(&agentlock);


Meanwhile, on the "other side", in agent_call, the code holds a lock 
on the agent_pvt structure from start to finish without first locking the 
agentlock:

static int agent_call(struct ast_channel *ast, char *dest, int timeout)
{
	struct agent_pvt *p = ast->pvt->pvt;
	struct ast_channel *chan;
	int res = -1;

	ast_mutex_lock(&p->lock);

...
... Code omitted
...

	ast_verbose( VERBOSE_PREFIX_3 "agent_call, call to agent '%s' call on '%s'\n", p->agent, p->chan->name);
	ast_log( LOG_DEBUG, "Playing beep, lang '%s'\n", p->chan->language);
	res = ast_streamfile(p->chan, beep, p->chan->language);
	ast_log( LOG_DEBUG, "Played beep, result '%d'\n", res);
	if (!res) {
		res = ast_waitstream(p->chan, "");
		ast_log( LOG_DEBUG, "Waited for stream, result '%d'\n", res);
	}
	if (!res) {
		res = ast_set_read_format(p->chan, ast_best_codec(p->chan->nativeformats));
		ast_log( LOG_DEBUG, "Set read format, result '%d'\n", res);
		if (res)
			ast_log(LOG_WARNING, "Unable to set read format to %s\n", ast_getformatname(ast_best_codec(p->chan->nativeformats)));
	} else {
		/* Agent hung-up */
		p->chan = NULL;
	}

	if (!res) {
		ast_set_write_format(p->chan, ast_best_codec(p->chan->nativeformats));
		ast_log( LOG_DEBUG, "Set write format, result '%d'\n", res);
		if (res)
			ast_log(LOG_WARNING, "Unable to set write format to %s\n", ast_getformatname(ast_best_codec(p->chan->nativeformats)));
	}
	if( !res )
	{
		/* Call is immediately up, or might need ack */
		if (p->ackcall > 1)
			ast_setstate(ast, AST_STATE_RINGING);
		else {
			ast_setstate(ast, AST_STATE_UP);
			if (recordagentcalls)
				agent_start_monitoring(ast,0);
			p->acknowledged = 1;
		}
		res = 0;
	}
	CLEANUP(ast,p);
	ast_mutex_unlock(&p->lock);
	return res;
}

Now during the execution of agent_call, something calls agent_devicestate 
- I suppose during execution of ast_waitstream.

agent_devicestate then tries to lock agentlock....

End result is that the agent_call thread has the agent_pvt struct locked 
and is trying to lock agentlock; the __login_exec thread has agentlock 
locked and is trying to lock the agent_pvt.  Result: deadlock.


Now I've got two possible fixes:

Option 1:

  agent_call releases its &p->lock lock before doing the ast_waitstream 
  etc.  Perhaps some code will be necessary to be sure the agent_pvt 
  doesn't disappear underneath us.

Option 2:

  agent_call should lock agentlock up front before &p->lock and hold that 
  from start to finish just like &p->lock.

I tried both and they both fixed the problem.  In which case I 
guess the option 2 is safer.

Comments?  Otherwise I'll send the patch.

Steve





More information about the asterisk-dev mailing list