[asterisk-dev] [Code Review] 4414: res_pjsip_session: Fix double re-INVITE collision crash.

rmudgett reviewboard at asterisk.org
Fri Feb 13 11:21:48 CST 2015






This change has been marked as submitted.

Review request for Asterisk Developers.


Committed in revision 431734

Bugs: ASTERISK-24727

Repository: Asterisk


A multi-asterisk box setup with direct media enabled would occasionally
crash when two re-INVITE collisions on a call leg happen in a row.

The re-INVITE logic only had one timer struct to defer the re-INVITE.
When the second collision happens the timer struct is overwritten and put
into the timer heap again.  Resources for the first timer are leaked and
the heap has two positions occupied by the same timer struct.  Now the
heap ordering is potentially corrupted, the timer will fire twice, and any
resources allocated for the second timer will be released twice.

* The solution is to put the collided re-INVITE into the delayed requests
queue with all the other delayed requests and cherry pick the next request
that can come off the queue when an event happens.

* Changed to put delayed BYE requests at the head of the delayed queue.
There is no sense in processing delayed UPDATEs and re-INVITEs when a BYE
has been requested.

* Made the start of a BYE request flush the delayed requests queue to
prevent a delayed request from overlaping the BYE transaction.  I saw a
few cases were a delayed re-INVITE got started after the BYE transaction

* Changed the delayed_request struct to use an enum instead of a string
for the request method.  Cherry picking the queue is easier with an enum
than string comparisons and the compiler can warn if a switch statement
does not cover all defined enum values.

* Improved the debug output to give more information.  It helps to know
which channel is involved with an endpoint.  Trunks can have many channels
associated with the endpoint at the same time.


  /branches/13/res/res_pjsip_session.c 431716 

Diff: https://reviewboard.asterisk.org/r/4414/diff/


Ran the full testsuite with the patch.  All but one test passes and the failing test fails with or without the patch.

Setup a test environment with two asterisk boxes and direct media enabled for all legs to perform this call repeatedly:
PJSIP/100 --> box 1 --> box 2 --> box 1 --> PJSIP/200

1) 100 calls 200
2) 200 answers
3) 100 hangs up
4) repeat call

A crash no longer happens on either Asterisk box with the patch.



