[asterisk-dev] Re: [SIP 0009612]: I believe this fixes a BAD BAD BAD Scenario

C. Maj cmaj-SPAM at freedomcorpse.com
Wed May 2 08:08:38 MST 2007


I believe this should be re-opened.  There's no harm in this
patch other than the trivial cost of an additional pointer.
The CDR is not touched in ast_channel_free, so there's no
reason why it can't be called before the CDR data is
released.  Reducing time in locks -- without requiring user
intervention (in this case, enabling batch mode in cdr.conf)
-- should always be a design goal.

--Chris


On Wed, May 02, 2007 at 07:46:29AM -0500, bugs at digium.com waxed:
> 
> The following issue has been CLOSED. 
> ====================================================================== 
> http://bugs.digium.com/view.php?id=9612 
> ====================================================================== 
> Reported By:                one47
> Assigned To:                
> ====================================================================== 
> Project:                    SIP
> Issue ID:                   9612
> Category:                   General
> Reproducibility:            always
> Severity:                   minor
> Priority:                   normal
> Status:                     closed
> Asterisk Version:           1.4.3 
> SVN Branch (only for SVN checkouts, not tarball releases): N/A  
> SVN Revision (number only!):  
> Disclaimer on File?:        Yes 
> Resolution:                 no change required
> Fixed in Version:           
> ====================================================================== 
> Date Submitted:             04-27-2007 04:46 CDT
> Last Modified:              05-02-2007 07:46 CDT
> ====================================================================== 
> Summary:                    I believe this fixes a BAD BAD BAD Scenario
> Description: 
> I discussed a BAD BAD BAD case with Olle in June last year. It is low
> priority because it happens during a hangup, so all that really happens is
> that a final ACK is ignored. I think I worked out what is happening, so I
> am including my original verbose/debug log, and a possible patch.
> ====================================================================== 
> 
> ---------------------------------------------------------------------- 
>  Corydon76 - 04-27-07 07:52  
> ---------------------------------------------------------------------- 
> If you have a slow database, I'd suggest that you rewrite your CDR driver
> to queue messages to be written via a separate thread, instead of posting
> them directly.  Changing the order of deallocation within the channel
> driver can also be BAD BAD BAD, as it changes assumptions about when events
> occur. 
> 
> ---------------------------------------------------------------------- 
>  one47 - 04-27-07 08:21  
> ---------------------------------------------------------------------- 
> The attached trace seems to hold a lock for a significant part of 1 second
> (WAY too long in VoIP terms), and is using purely Asterisk-basic CDR code -
> 1 copy of cdr record sent to text file, and one sent to cdr_pgsql.
> 
> Also, the call to "ast_cdr_end" is unchanged, so the contents of the CDR
> record will be unchanged by the patch.
> 
> It is only the act of writing it to disk that is moved. The move is
> similarly very small, as it just moves the "CDR push and free" to the other
> side of the call to "chan->tech->hangup(chan)" so that the channel-proper
> is finished with and the locks are cleared before some potentially blocking
> disk activity and SQL preparation is required.
> 
> I appreciate what you say abbout changing the order of deallocation, but
> the CDR record pointer is deallocated (set to NULL) in exactly the same
> place after this patch. We just take a copy of the data so we can write it
> very slightly later.
> 
> The trace shows 2 attempts to get a channel lock (one from the SIP
> protocol thread, and one from the devstate thread) while the hangup process
> is holding that lock and updating the CDR. 
> 
> ---------------------------------------------------------------------- 
>  oej - 04-27-07 09:07  
> ---------------------------------------------------------------------- 
> one47: Please change cdr.conf so we can test with the cdr caching - that
> propably should not lock for a long time and will cause the BAD BAD BAD
> messages to disappear if your theory is right. We need this feedback. 
> 
> ---------------------------------------------------------------------- 
>  one47 - 04-27-07 10:03  
> ---------------------------------------------------------------------- 
> Hi,
> 
> I could not find any reference to CDR caching, even in SVN trunk, so
> instead I tested as follows using a 1.2.18 system:
> 
> 1) CDR to .txt and PGSQL, database idle, and revently VACUUM'ed for
> performance.
>    Problem occurred occasionally.
> 2) CDR to .txt only
>    Could not reproduce problem
> 3) CDR to .txt and PGSQL, database made artificially busy.
>    Problem occurred on nearly every BUSY call.
> 
> Applied patch as supplied, then retest:
> 
> 4) CDR to .txt and PGSQL, database idle, and revently VACUUM'ed for
> performance.
>    Could not reproduce problem
> 5) CDR to .txt only
>    Could not reproduce problem
> 6) CDR to .txt and PGSQL, database made artificially busy.
>    Could not reproduce problem
> 
> I am happy to try CDR caching or any other tests if you can explain how. I
> could perhaps add log entries that time the ast_hangup process to verify
> where the time goes? 
> 
> ---------------------------------------------------------------------- 
>  oej - 04-30-07 02:06  
> ---------------------------------------------------------------------- 
> It's called "batch" mode, sorry.
> 
> cdr.conf.sample:
> ----
> ; Define the CDR batch mode, where instead of posting the CDR at the end
> of
> ; every call, the data will be stored in a buffer to help alleviate load
> on the
> ; asterisk server.  Default is "no".
> ;
> ; WARNING WARNING WARNING
> ; Use of batch mode may result in data loss after unsafe asterisk
> termination
> ; ie. software crash, power failure, kill -9, etc.
> ; WARNING WARNING WARNING
> ;
> ;batch=no 
> 
> ---------------------------------------------------------------------- 
>  one47 - 04-30-07 03:54  
> ---------------------------------------------------------------------- 
> I gave this a fairly quick test with good results. I tried creating big
> batches and causing quite high call-volume without seeing any obvious
> problems.
> 
> There is of course the downside in fixing it this way, in that your CDR
> records are written after a delay. 
> 
> ---------------------------------------------------------------------- 
>  cmaj - 04-30-07 14:29  
> ---------------------------------------------------------------------- 
> If you set the batch size = 1, then that should get those CDRs submitted
> fairly quickly, at the cost of more threads.  But they're cheap threads. 
> 
> ---------------------------------------------------------------------- 
>  one47 - 04-30-07 16:27  
> ---------------------------------------------------------------------- 
> Yes, batch=1 does help, and is approximately equivalent to the attached
> patch. The difference being that this patch does it without an extra thread
> being created for almost every CDR write, which is beneficial, even if it
> is only a small benefit, and it also means that the _default_ case is
> holding channel locks for less time... 
> 
> ---------------------------------------------------------------------- 
>  oej - 05-02-07 04:48  
> ---------------------------------------------------------------------- 
> Corydon76: Since this can be handled by the cdr batching, do you agree with
> me that this bug is resolved or do you see some value in the proposed
> patch? I don't really have much insights into the CDR subsystem. 
> 
> ---------------------------------------------------------------------- 
>  Corydon76 - 05-02-07 07:46  
> ---------------------------------------------------------------------- 
> No change required, as this can be fixed with batching. 
> 
> Issue History 
> Date Modified   Username       Field                    Change               
> ====================================================================== 
> 04-27-07 04:46  one47          New Issue                                    
> 04-27-07 04:46  one47          Issue Monitored: one47                       
> 04-27-07 04:47  one47          File Added: badbadbad.txt                    
> 04-27-07 04:47  one47          Asterisk Version          => 1.4.3           
> 04-27-07 04:47  one47          SVN Branch (only for SVN checkou => N/A          
>   
> 04-27-07 04:47  one47          Disclaimer on File?       => Yes             
> 04-27-07 04:47  one47          File Added: badbadbad_1.2.18.patch               
>     
> 04-27-07 04:47  one47          File Added: badbadbad_1.4.trunk.patch            
>        
> 04-27-07 07:52  Corydon76      Note Added: 0062741                          
> 04-27-07 08:21  one47          Note Added: 0062745                          
> 04-27-07 09:07  oej            Note Added: 0062754                          
> 04-27-07 09:11  oej            Status                   new => feedback     
> 04-27-07 10:03  one47          Note Added: 0062774                          
> 04-30-07 02:06  oej            Note Added: 0062848                          
> 04-30-07 03:54  one47          Note Added: 0062853                          
> 04-30-07 14:25  cmaj           Issue Monitored: cmaj                        
> 04-30-07 14:29  cmaj           Note Added: 0062879                          
> 04-30-07 16:27  one47          Note Added: 0062884                          
> 05-02-07 04:48  oej            Note Added: 0062953                          
> 05-02-07 07:46  Corydon76      Note Added: 0062959                          
> 05-02-07 07:46  Corydon76      Status                   feedback => closed  
> 05-02-07 07:46  Corydon76      Resolution               open => no change
> required
> ======================================================================



More information about the asterisk-dev mailing list