[Asterisk-code-review] res pjsip pubsub: Add more locking and checking around termi... (asterisk[13])

George Joseph asteriskteam at digium.com
Sun Jun 12 13:31:40 CDT 2016


George Joseph has uploaded a new patch set (#2).

Change subject: res_pjsip_pubsub: Add more locking and checking around terminates
......................................................................

res_pjsip_pubsub: Add more locking and checking around terminates

A subsciption can be terminated by either a SUBSCRIBE with Expires = 0
or by the subscription timer expiring.  In heavily loaded systems
it's possible for the timer to fire at the same time we get a SUBSCRIBE
from the client either to terminate or refresh the subscription.

Client --> on_evsub_state     push --|
                                     serialized_pubsub_on_server_timeout
Timer  --> on_server_timeout  push --|

Both paths result in the pushing of a task to the subscription's
serializer, but on_server_timeout wasn't checking the last_notify flag
to see if a client-requested terminate was already in progress and
although on_evsub_state was checking it, it wasn't setting it.
This could result in 2 terminate tasks being placed on the serializer.
Since the last_notify flag was set only after the NOTIFY was sent
and the terminate task wasn't checking the flag either, both tasks
could run.  Additionally, some of the other pubsub tasks weren't
locking the dialog or checking last_notify so they could also be
running while pjproject was running one of the callbacks in another
thread.

In the end, there just wasn't enough locking and state checking and
we were seeing SEGVs in the pjproject code usually related to the dialog
being invalid.

Some of the checking in this patch is probably overkill but they're
simple boolean or null pointer checks so they're cheap.  Also some of
the debug levels in existing code were updated.

Changes:
 * Debug levels were changed so 1 is infrequent possible errors and
   3 is high-volume informational messages.
 * The evsub_state and server_timeout pjproject callbacks now increment
   last_notify while holding the dialog lock but before pushing the
   terminate task on the serializer.
 * The server_timeout callback now checks last_notify before pushing
   the terminate task.
 * The other callbacks and tasks now check that evsub, dlg and
   last_notify are valid before doing anything and the tasks now lock
   the dialog.
 * The serialized_pubsub_on_server_timeout task was renamed to
   serialized_pubsub_terminate and now checks the last_notify flag.  If
   it's > 1, then somehow 2 terminate tasks were pushed so this task
   decrements the flag and returns. The second task will now see the
   flag as 1 and will process.

ASTERISK-26099 #close
Reported-by: Ross Beer.

Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
---
M res/res_pjsip_pubsub.c
1 file changed, 121 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.asterisk.org:29418/asterisk refs/changes/18/3018/2
-- 
To view, visit https://gerrit.asterisk.org/3018
To unsubscribe, visit https://gerrit.asterisk.org/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
Gerrit-PatchSet: 2
Gerrit-Project: asterisk
Gerrit-Branch: 13
Gerrit-Owner: George Joseph <gjoseph at digium.com>



More information about the asterisk-code-review mailing list