[asterisk-dev] [Code Review] Rework CDR API Internals to Prevent Loss of CDR Records

Russell Bryant russell at digium.com
Tue Mar 23 14:29:20 CDT 2010



> On 2010-03-15 18:54:21, Tilghman Lesher wrote:
> > I think there's some refactoring of cdr_adaptive_odbc that still needs to be done in order to prevent record loss.

Thanks for the review!


> On 2010-03-15 18:54:21, Tilghman Lesher wrote:
> > /trunk/cdr/cdr_adaptive_odbc.c, lines 168-170
> > <https://reviewboard.asterisk.org/r/531/diff/3/?file=8399#file8399line168>
> >
> >     In the current module, we avoid unloading during execution through the use of a rwlock that cannot be write-locked while a routine is executing.  What prevents an unload while the callback is executing?
> >     
> >     Also, changing the uniqueid to a hash of the context name may prove to be more helpful in making sure that a reload doesn't cause lost records.

This is a good observation.  The old CDR API did nothing to prevent modules from being unloaded during a callback, and neither does this new API.
A lock in the module itself does not appropriately protect against unloading a module.  An unload can occur and grab the lock before the callback is able to acquire the readlock.  While the use of the lock lessens the window of time unloading the module while in use is possible, it does not remove it completely.

As is implemented for other interfaces that can be implemented by Asterisk modules, the way to solve the problem is by using module reference counting.  I will add that here.


> On 2010-03-15 18:54:21, Tilghman Lesher wrote:
> > /trunk/cdr/cdr_adaptive_odbc.c, line 386
> > <https://reviewboard.asterisk.org/r/531/diff/3/?file=8399#file8399line386>
> >
> >     The readlock currently prevents the record from disappearing via a reload event.  What prevents the record from disappearing via a reload while a sink callback is executing?

Good point.  I'm going to need some time to think through how to address this problem.


> On 2010-03-15 18:54:21, Tilghman Lesher wrote:
> > /trunk/include/asterisk/cdr.h, lines 154-161
> > <https://reviewboard.asterisk.org/r/531/diff/3/?file=8405#file8405line154>
> >
> >     What happens if a configuration file is reloaded, causing a drop of the sink, followed by a repopulation of the sink?  If the records are not able to be posted at that stage, are they kept for reposting later or are they lost?
> >     
> >     It looks like the way that you're populating the uniqueid, they will be lost.  Additionally, this means that a reload that previously took a fraction of a second may now take minutes to complete.  In that time, records may be posted that fail to hit certain backends (notably adaptive ODBC) because of this lag.

Thanks for the feedback.  The reload situation is a mess.  I'll work to address it.


- Russell


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/531/#review1673
-----------------------------------------------------------


On 2010-03-23 13:41:09, Russell Bryant wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/531/
> -----------------------------------------------------------
> 
> (Updated 2010-03-23 13:41:09)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> This code comes from the team/russell/cdr-q branch.  It is something I did on a couple of flights this week.
> 
> The current CDR implementation in Asterisk does nothing to ensure that every CDR record is successfully logged to configured logging destinations.  A single attempt is made to log a record.  If for some reason it fails, such as connectivity to a database being temporarily disrupted, then the records are lost.  Obviously, this is very bad in many cases where these records are used for things like billing.
> 
> The primary purpose behind these changes is to ensure that every record is successfully logged to every destination that has been configured.  If posting a CDR fails, it will be queued up for a retry later.
> 
> A new method has been added for backends to register callbacks for receiving CDR records.  For CDR modules that allow posting CDRs to multiple destinations, such as multiple DB tables or multiple text files, they now make a registration into the core for every destination.  That way, the success or failure of the attempt to each destination can be tracked.  All backends that need it have been updated to operate this way.
> 
> main/cdr.c has also been updated to use more modern Asterisk APIs.  It has also been updated with various other bits of code cleanup.
> 
> One side effect of these changes is that the code that previously implemented the batch mode has been completely removed, as that code is now deprecated.  The biggest benefit from batch mode was handling CDR posting asynchronously from channel threads.  In this code, that is the only behavior that exists.
> 
> 
> This addresses bug 17042.
>     https://issues.asterisk.org/view.php?id=17042
> 
> 
> Diffs
> -----
> 
>   /trunk/UPGRADE.txt 249057 
>   /trunk/cdr/cdr_adaptive_odbc.c 249058 
>   /trunk/cdr/cdr_custom.c 249058 
>   /trunk/cdr/cdr_odbc.c 249058 
>   /trunk/cdr/cdr_radius.c 249058 
>   /trunk/cdr/cdr_syslog.c 249058 
>   /trunk/configs/cdr.conf.sample 249057 
>   /trunk/include/asterisk/cdr.h 249057 
>   /trunk/main/asterisk.c 249057 
>   /trunk/main/cdr.c 249057 
>   /trunk/main/features.c 249057 
>   /trunk/main/manager.c 249057 
> 
> Diff: https://reviewboard.asterisk.org/r/531/diff
> 
> 
> Testing
> -------
> 
> I have done some manual testing.  However, before I consider this done, I would like to implement some automated tests, as well as touch on more backends than what I have tested so far.
> 
> I configured cdr_custom to log CDRs to multiple files.  I made calls and verified that records were logged successfully.  I made some modifications on disk such that attempts to log to one file would fail, while the other would continue to succeed.  I made more calls and verified that the CDR queue grew for the specific file that had failures.  I fixed the problem and continued making calls.  I verified that all previously failed CDRs were posted and that both files were in sync with the number of CDRs that they had.
> 
> 
> Thanks,
> 
> Russell
> 
>




More information about the asterisk-dev mailing list