<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>First of all, I know that 11.25.x is in security fix mode, if not
already EOL.</p>
<p>I am providing support for a client that runs an Asterisk 11.25.0
installation as a callcenter. Updating to any later branch is not
possible right now because the client is very cautious about the
stability of the software, and because the call center software
(written by me) uses chan_agent extensively and has not (yet) been
ported to the new method of proxy agents under Asterisk 13 and
later.</p>
<p>The way the callcenter runs right now is by using AMI Originate
between Local/XXXXX@context (where XXXXX is the outbound number to
dial), and an extension that drops the call into a Queue where
several agents (logged in using chan_agent as Agent/YYYYY) are
members of the queue, and are available to connect calls. The
callcenter supervisors frequently (several times per day) move
queue members in and out of the queues as required by the ongoing
work load. The callcenter agent workers become available in the
queues by logging into the Agent channel - they cannot choose to
log into the queue itself.<br>
</p>
<p>Most of the time, this setup works correctly. However, around
once or twice per week, a situation arises where one or more
agents are logged-in and idle (according to reality and to the
output of "agent show"), but the queues where they are present as
members show them as being Busy. An (anonymized) example is show
below:</p>
<pre>[root@CALLCENTERSERVER ~]# asterisk -rnx 'queue show 108' | grep Busy ; asterisk -rnx 'agent show' | grep 1562
Agent/4248 (ringinuse enabled) (Busy) has taken no calls yet
Agent/4275 (ringinuse enabled) (Busy) has taken no calls yet
Agent/1562 (ringinuse enabled) (Busy) has taken no calls yet
Agent/4259 (ringinuse enabled) (Busy) has taken no calls yet
Agent/4286 (ringinuse enabled) (Busy) has taken no calls yet
1562 (XXXXXX XXXXXXXX XXXXXXXX XXXXX) logged in on SIP/9028-0003f3e6 is idle (musiconhold is 'none')
[root@CALLCENTERSERVER ~]#
</pre>
<p>Here, agent channel Agent/1562 is logged-in and free according to
"agent show". However, queue "108" shows this agent (as well as a
few others) to be Busy. Also it was found that in this particular
instance, at least Agent/4248 is not even logged in (also,
according to "agent show"). From experimentation, no amount of
moving agents between queues will clear this situation for the
affected agents, and the only solution that works so far is a full
Asterisk restart.</p>
<p>I have a copy of the git repository and am able to recompile at
will. By studying the relevant code, I have found that the device
state is cached in at least two levels:</p>
<ol>
<li>The queue member has a "status" field in the struct
representing it in app_queue. This field is supposed to be
updated by a subscription to the device change event supplied by
the Asterisk core.</li>
<li>The Asterisk core maintains a cache of the device state for
all devices, including Agent/XXXXX from chan_agent.</li>
</ol>
Because of this, I had to check which level is the one that is out
of sync with reality. In wiki.asterisk.org I learned of the
DEVICE_STATE() function to query the current (cached) device state
of its parameter. Therefore, on the above example, I ran the
following using telnet:<br>
<br>
<pre>[root@CALLCENTERSERVER ~]# telnet 127.0.0.1 5038
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Asterisk Call Manager/1.3
Action: Login
Username: admin
Secret: *********
Response: Success
Message: Authentication accepted
Event: FullyBooted
Privilege: system,all
Status: Fully Booted
Action: Events
EventMask: off
Response: Success
Events: Off
Action: GetVar
Variable: DEVICE_STATE(Agent/1562)
Response: Success
Variable: DEVICE_STATE(Agent/1562)
Value: BUSY
Action: Logoff
Response: Goodbye
Message: Thanks for all the fish.
Connection closed by foreign host.
[root@CALLCENTERSERVER ~]#
</pre>
I see that DEVICE_STATE() also believes that the agent channel is
Busy when it is actually idle. So I conclude that the issue lies
somewhere in chan_agent and how the Asterisk core gets its cached
value from it.<br>
<br>
Do you have any advice (other than just "update Asterisk") on where
to go from here? I see that the chan_agent source code only sets
device state to Unknown, and not to other states. Is there a way to
force Asterisk to flush or refresh the device state cache, either
globally or per device? If none exists, what do you think of
implementing such a command as a workaround while searching for the
true solution?<br>
</body>
</html>