[asterisk-dev] [Code Review] Initial review request for CCSS architecture and generic implementations

Mark Michelson mmichelson at digium.com
Tue Nov 3 17:53:56 CST 2009


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/410/
-----------------------------------------------------------

(Updated 2009-11-03 17:53:56.171388)


Review request for Asterisk Developers, Russell Bryant, Kevin Fleming, and rmudgett.


Changes
-------

After the last review was posted, I had a sit-down meeting with Richard Mudgett to discuss, at a high level, the way the code worked. The discussion showed several potential problem areas, and so they have been addressed in this latest update. Note that at this point, I have no reasons to need to do any immediate changes in the CCSS branch, so it should be safe to review this code without the fear that I may possibly make some huge commits which nullify your comments.

Here is a list of the major changes that occurred since that meeting:

* The CC agent and CC core instance are now created when we receive the first AST_CONTROL_CC frame from an outbound device. We need to have the agent created at this point so that we can alert the caller's channel driver that he should be offering CC to the caller. It needs to be done early because the offer needs to be embedded in the ringing, answer, or hangup frame sent to the caller. Prior to this change, the agent and core instance were allocated at the end of wait_for_answer in app_dial, which was too late.

* With relation to the above bullet point, a new agent callback, offer_cc, has been added. This is called for each AST_CONTROL_CC frame read. The purpose of this callback is to allow the agent's channel driver to at least mark that CC should be offered in upcoming messages. The channel driver may even attempt to send specific messages to the caller during this callback as well if it is necessary.

* The core no longer will attempt to request the status of the calling party when the called party becomes available. Instead, the core calls a new callee_available agent callback. In this, the agent is expected to send appropriate signaling to the caller so that the caller may attempt to initiate a recall. If the agent is able to make the determination that the caller is busy, then the agent may call ast_cc_caller_busy to indicate to the core the caller's busy status. The callee_available callback essentially replaces the recall callback.

* With relation to the above point, the status request mechanism has been drastically changed. Through discussions, we found that status requests can possibly block for longer than was desired, plus in certain cases, a single status request may result in multiple statuses being returned. The new model is that a monitor (or code pertaining to monitor logic) may call ast_cc_status_request in order to request the current status of the calling party. The core then calls the agent's status_request callback. The agent is expected to send whatever necessary messages to request a status update and then return. As responses to the status request come into the agent, the agent may call ast_cc_status_response. The core then can call the monitor's status_response callback in order to communicate the status(es) of the caller.

* The function ast_cc_monitor_announce_availability has been changed to ast_cc_callee_available. This was done both to shorten the name of the function and to make it align better with the names of functions on the caller side.

* The stop_monitoring agent callback has been removed. It didn't make a whole lot of sense for the agent to detect that the caller had once again become available, notify the core of the change, and then wait for the core to tell it to stop monitoring the caller. Instead, at the time that the agent detects that the caller has become available, it is expected to take the necessary steps to stop monitoring the caller.

* The function ast_cc_get_recall_core_id has been removed in favor of ast_cc_is_recall. This function does a much deeper check to see if the channel is actually involved in a CC recall and will return a boolean to indicate if it is. In addition, if the channel is involved in a CC recall, the core ID of the recall will be placed in an output parameter.

* The timing of the call to ast_cc_completed has been changed. The previous behavior was to call it as soon as we had received any frame from any outbound device during a CC recall. However, this is not guaranteed to be safe in certain oddball setups and under certain oddball network conditions. This function is now called when wait_for_answer exits in app_dial. We know at this point that we have received any and all responses from endpoints by now.

After fixing the above points, I re-ran most of the tests described in the "Testing done" section and found and fixed a few bugs as a result. To reiterate, I don't have any immediate plans to continue updating code in this branch (at least not with code related to the CC core or generic CC), so you can feel safe reviewing this code now ;)


Summary
-------

This is the first attempt at call completion code for Asterisk. It follows the architecture document which was sent to the dev list for the most part, but there is one significant change. That is, instead of modifying the Hangup() dialplan application to take an argument stating whether to offer CC, the decision is now made based on the cc_agent_policy of the calling channel. If it is set to anything other than "never," then CC will be offered.

There is an opaque structure used for defining CC configuration parameters and an API for getting and setting each one. There is also an API call to set a parameter's value based on the name of the parameter and a passed-in value. Thus far, the SIP channel driver is the only channel driver which has had the necessary code changes to be able to configure CC parameters within the channel driver's configuration file. Other channel types can still get and set CC parameters by using the CALLCOMPLETION dialplan function, though.

Now onto the process used.

An incoming channel has a datastore attached to it which contains information about the CC-capable interfaces dialed. This information is stored in a linked-list (with metadata that allows it to act as a "tree"). In addition, a "core_id" is assigned to the call. The core_id is used as a unique identifier for the CC transaction. When dialing is completed, a cc_core_instance structure and an ast_cc_agent structure are created for the call. The interface tree on the channel's datastore also resides on the agent structure.

The caller can request CC using the CallCompletionRequest dialplan application. This, plus all other core state machine changes (see the architecture document for information about the state machine) are handled by a taskprocessor. The reason for this is that it allows for the core to handle all CC-related requests one at a time and not have to worry about corruption of data due to poor locking techniques.

The core will create an ast_cc_monitor for each outbound interface on which CC has been requested. The monitors form a tree-like structure with ast_cc_monitor_links being used as edges. In addition, the monitors are stored in an ao2 container so that an individual monitor may be found without needing to traverse the tree. All operations that work on the monitor tree happen in the core's taskprocessor thread.

The generic monitor works by using a device state subscription. When a monitored device becomes available, the original caller is called back. When the original caller answers, a new datastore is created on the caller's channel which contains the same interface tree that was stored in the original call and on the agent structure. This is used so that the CC_INTERFACES channel variable can be properly populated at each step of the dialplan.

Currently, the code is in a state where it works and it works well. However, it is lacking a lot of bells and whistles, and the code itself could benefit from some more work being done on it. Here are some of my ideas for improving the code.

1) There are several XXX comments throughout the code. Rather than reiterate them here in the review request, I'll let them speak for themselves.
2) There currently are no CLI commands to do things such as checking on the current state of CC requests.
3) There are no manager events yet.
4) There are a bunch of LOG_NOTICE messages currently in the code that will need to be changed before going live. The easy way to go is to change them to debug messages, but what I'd really like to do is to create a custom CC log level instead for them.
5) ast_cc_request_state_change takes a debug string that will be printed so that you can see why a state change occurred. I want to change this to a printf-style argument list so that it will be easier to write more specific information.
6) There are two very similar functions in ccss.c, cc_device_tree_item_init and cc_extension_tree_item_init, which should probably be combined into a single function.
7) Despite my efforts to group like functions together, I still feel that ccss.c is a total mess and is hard to follow. I would like to fix this by doing one of two things which are very unAsterisk-like. Either a) separate the functions into separate files, or b) place function prototypes for every internal function at the top of ccss.c so that I can place things into an order where I don't have to make sure that I only call static functions defined above the one I'm currently in.

If any reviewers find other areas which can use improvement, or find any places where I'm obviously doing things incorrectly, then I would appreciate it.


Diffs (updated)
-----

  /trunk/apps/app_dial.c 227494 
  /trunk/channels/chan_local.c 227494 
  /trunk/channels/chan_sip.c 227494 
  /trunk/configs/ccss.conf.sample PRE-CREATION 
  /trunk/doc/tex/asterisk.tex 227494 
  /trunk/doc/tex/ccss.tex PRE-CREATION 
  /trunk/funcs/func_callcompletion.c PRE-CREATION 
  /trunk/include/asterisk/ccss.h PRE-CREATION 
  /trunk/include/asterisk/channel.h 227494 
  /trunk/include/asterisk/channelstate.h PRE-CREATION 
  /trunk/include/asterisk/devicestate.h 227494 
  /trunk/include/asterisk/frame.h 227494 
  /trunk/include/asterisk/manager.h 227494 
  /trunk/main/asterisk.c 227494 
  /trunk/main/ccss.c PRE-CREATION 
  /trunk/main/channel.c 227494 
  /trunk/tests/test_amihooks.c 227494 

Diff: https://reviewboard.asterisk.org/r/410/diff


Testing
-------

1) Configuration testing.
- chan_sip correctly reads and sets CC configuration settings set in chan_sip.conf
- A channel driver (chan_console in my case) that does not have CC configuration settings gets the default settings.
- The CALLCOMPLETION function correctly sets parameters and overwrites those set in the channel driver configuration settings.
- The CALLCOMPLETION function correctly sets parameters for channel drivers which cannot parse channel driver configuration settings.

2) Simple Call from one SIP phone to one other SIP phone.
- CCBS and CCNR are offered to the caller at proper times
- The tree of interfaces on the incoming channel's datastore is correct.
- The proper CC service is requested of the outbound interface
- The CallCompletionRequest application operates properly
- The CallCompletionCancel application operates properly
- The monitor tree formed from the interface tree is correctly structured and its data is correct
- The core properly detects when the called party has become available and calls the caller back
- The configured callback macro is executed on the caller's channel when he answers.
- The proper extension is dialed on the CC recall
- The CC_INTERFACES channel variable has the proper interfaces on it (to include Local channel specifics such as /b and /n)

3) Same test as 2), but the calling phone is busy when the called phone becomes available
- All points from 2) hold true
- The called phone's monitoring is properly suspended and the caller's phone is monitored
- When the caller becomes available again, the caller's phone is properly called back.
- Tested back-and-forth situations (i.e. the callee becomes available but the caller is busy, then the caller becomes available but the callee is busy) and they worked fine.

4) Ran tests 2) and 3) but with several layers of Local channels used before dialing an endpoint
- All points from 2) and 3) hold true.

5) Ran tests 2), 3), and 4) but dialed two SIP phones instead of just one.
- All points from 2), 3), and 4) hold true.
- Specifically, the core picks up when the first callee becomes available and calls the caller back.

6) Ran test 2) and had the phone forward the call
- Confirmed that CCNR was offered to the caller
- Confirmed that on recall, the forwarding phone was called back

7) Ran test where two Dials were made in a single extension
- Confirmed that CC was only offered for interfaces dialed in the first instance of Dial

8) Tested testable failure points to be sure that nothing catastrophic happened.
- Called CallCompletionRequest under atypical settings (e.g. Call when there was no previous failed call, call after cancelling CC, call when the CC process has progressed further)
- Called CallCompletionCancel when there was no CC request to cancel.
- Allowed both offer and available timers to expire.
- Caller hung up during execution of the callback macro.

9) Ran all of the above tests with MALLOC_DEBUG enabled and under valgrind
- Confirmed that the memory used by ccss.c before and after all tests was the same.
- Found that no invalid read or invalid writes were reported by valgrind.

Untested:

- Limitations, such as cc_max_requests, cc_max_agents, and cc_max_monitors.


Thanks,

Mark




More information about the asterisk-dev mailing list