[Asterisk-code-review] chan sip: Don't unload module while TCP/TLS threads are stil... (asterisk[master])

Corey Farrell asteriskteam at digium.com
Fri Dec 2 19:06:04 CST 2016


Corey Farrell has posted comments on this change. ( https://gerrit.asterisk.org/4536 )

Change subject: chan_sip: Don't unload module while TCP/TLS threads are still running
......................................................................


Patch Set 2:

> I think I'm giving up on this. I needed to touch almost every line
 > in unload_module and still couldn't get it right so that module
 > unload chan_sip could be re-run after all threads had become
 > unblocked and had terminated -- every time I fixed one thing, two
 > others seemed to pop up. After all, it's not the most important bug
 > to fix and these race conditions should be very rare (e.g. if a
 > thread is blocked but doesn't respond to SIGURG, then magically
 > becomes unstuck after the unload).
 > 
 > One thing I would like to keep is the second attempt at
 > pthread_kill(thread, SIGURG) in the current patch set. That gives
 > threads that hadn't reached their blocking syscall yet at the point
 > the signal was first sent the opportunity to terminate properly. If
 > I add a time limit of, say, 10 seconds to the while loop in line
 > 35527, would that be acceptable to merge?
 > 
 > The problems that will remain are the leaking of deadlocked threads
 > (unfixable at this level, but it's mainly a memory leak, therefore
 > it doesn't matter during shutdown) and the crashing of threads that
 > become unblocked after being blocked on a call that cannot return
 > EINTR (messy to fix, see above, but probably very unlikely to
 > occur; could crash Asterisk during shutdown if the thread becomes
 > unstuck before shutdown is completed).

I'm not sure if additional calls to pthread_kill help.  If it didn't work the first time why would it work the second time?  I'm not a fan of expanding the time limit either, in fact I think the goal should be to shorten the time limit.  Module unload should not be taking long periods of time to run - it should just succeed or fail.

Please give me a chance to try patching to allow unload_module to fail gracefully (return -1).  I think I should be able to reorder it to safely protect against threads that become unstuck.  I will need your assistance testing it as the only place I've ever used TCP based SIP is in the testsuite.

-- 
To view, visit https://gerrit.asterisk.org/4536
To unsubscribe, visit https://gerrit.asterisk.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I23a8abf9e8f73b57ff7ada61984b8506aea6cbde
Gerrit-PatchSet: 2
Gerrit-Project: asterisk
Gerrit-Branch: master
Gerrit-Owner: Michael Kuron <m.kuron at gmx.de>
Gerrit-Reviewer: Anonymous Coward #1000019
Gerrit-Reviewer: Corey Farrell <git at cfware.com>
Gerrit-Reviewer: Michael Kuron <m.kuron at gmx.de>
Gerrit-HasComments: No



More information about the asterisk-code-review mailing list