[asterisk-bugs] [JIRA] (ASTERISK-24983) IAX deadlock between hangup and scheduled actions (ex. largrq)

Y Ateya (JIRA) noreply at issues.asterisk.org
Mon Apr 20 17:05:32 CDT 2015


     [ https://issues.asterisk.org/jira/browse/ASTERISK-24983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Y Ateya updated ASTERISK-24983:
-------------------------------

    Description: 
Randomly some of my asterisk servers (SIP-to-IAX) _freezes_. After some investigation I found that this happens because of a deadlock between {{iax2_hangup}} and {{send_lagrq}} (It can happen with {{send_ping}} too).

Here is the sequence of _unfortunate_ events to have this deadlock:
  - When a call starts, {{send_lagrq}} is scheduled to run after some time.
  - {{iax2_hangup}} is called.
  - It locks the call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. Note that later in hangup procedures, we will try to delete scheduled {{send_lagrq}}.
  - Before Deleting {{send_lagrq}}, context switch happened and scheduler found that it is time to run the scheduled {{send_lagrq}}!
  - {{send_lagrq}} is called and tries to acquire call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. So {{send_lagrq}} is waiting for hangup to finish.
  - After sometime, {{iax2_hangup}} reaches the place to delete scheduled lagrq and ping events. This occurs in function {{iax2_destroy_helper}} by calling {{AST_SCHED_DEL_SPINLOCK(sched, pvt->lagid, &iaxsl\[pvt->callno\])}}, which calls {{ast_sched_del}}, which finds that {{send_lagrq}} is still being serverd {{else if (con->currently_executing && (id == con->currently_executing->id))}}, so it **wait indefinitly**.
  - *Scheduler is blocked*: All events in the scheduler are waiting for this event to finish.
  - *IAX call is blocked*: every one tries to lock the call lock is locked too. After minutes I ended up with hundreds of locked threads.

  
I don't know which is better:
  - Fixing chan_iax2 to prevent this deadlock.
  - Fixing scheduler to prevent this deadlock.

Changing scheduler behavior will impact many people, so I decided to change chan_iax to fix the problem AND change scheduler to report when this deadlock happens.

Patch attached, gerrit added too (https://gerrit.asterisk.org/#/c/169/).

  was:
Randomly some of my asterisk servers (SIP-to-IAX) _freezes_. After some investigation I found that this happens because of a deadlock between {{iax2_hangup}} and {{send_lagrq}} (It can happen with {{send_ping}} too).

Here is the sequence of _unfortunate_ events to have this deadlock:
  - When a call starts, {{send_lagrq}} is scheduled to run after some time.
  - {{iax2_hangup}} is called.
  - It locks the call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. Note that later in hangup procedures, we will try to delete scheduled {{send_lagrq}}.
  - Before Deleting {{send_lagrq}}, context switch happened and scheduler found that it is time to run the scheduled {{send_lagrq}}!
  - {{send_lagrq}} is called and tries to acquire call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. So {{send_lagrq}} is waiting for hangup to finish.
  - After sometime, {{iax2_hangup}} reaches the place to delete scheduled lagrq and ping events. This occurs in function {{iax2_destroy_helper}} by calling {{AST_SCHED_DEL_SPINLOCK(sched, pvt->lagid, &iaxsl\[pvt->callno\])}}, which calls {{ast_sched_del}}, which finds that {{send_lagrq}} is still being serverd {{else if (con->currently_executing && (id == con->currently_executing->id))}}, so it **wait indefinitly**.
  - *Scheduler is blocked*: All events in the scheduler are waiting for this event to finish.
  - *IAX call is blocked*: every one tries to lock the call lock is locked too. After minutes I ended up with hundreds of locked threads.

  
I don't know which is better:
  - Fixing chan_iax2 to prevent this deadlock.
  - Fixing scheduler to prevent this deadlock.

Changing scheduler behavior will impact many people, so I decided to change chan_iax to fix the problem AND change scheduler to report when this deadlock happens. Will upload patch shortly.
  


> IAX deadlock between hangup and scheduled actions (ex. largrq)
> --------------------------------------------------------------
>
>                 Key: ASTERISK-24983
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-24983
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_iax2
>    Affects Versions: 13.3.2
>         Environment: Ubuntu
>            Reporter: Y Ateya
>         Attachments: iax_hangup_deadlock.diff
>
>
> Randomly some of my asterisk servers (SIP-to-IAX) _freezes_. After some investigation I found that this happens because of a deadlock between {{iax2_hangup}} and {{send_lagrq}} (It can happen with {{send_ping}} too).
> Here is the sequence of _unfortunate_ events to have this deadlock:
>   - When a call starts, {{send_lagrq}} is scheduled to run after some time.
>   - {{iax2_hangup}} is called.
>   - It locks the call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. Note that later in hangup procedures, we will try to delete scheduled {{send_lagrq}}.
>   - Before Deleting {{send_lagrq}}, context switch happened and scheduler found that it is time to run the scheduled {{send_lagrq}}!
>   - {{send_lagrq}} is called and tries to acquire call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. So {{send_lagrq}} is waiting for hangup to finish.
>   - After sometime, {{iax2_hangup}} reaches the place to delete scheduled lagrq and ping events. This occurs in function {{iax2_destroy_helper}} by calling {{AST_SCHED_DEL_SPINLOCK(sched, pvt->lagid, &iaxsl\[pvt->callno\])}}, which calls {{ast_sched_del}}, which finds that {{send_lagrq}} is still being serverd {{else if (con->currently_executing && (id == con->currently_executing->id))}}, so it **wait indefinitly**.
>   - *Scheduler is blocked*: All events in the scheduler are waiting for this event to finish.
>   - *IAX call is blocked*: every one tries to lock the call lock is locked too. After minutes I ended up with hundreds of locked threads.
>   
> I don't know which is better:
>   - Fixing chan_iax2 to prevent this deadlock.
>   - Fixing scheduler to prevent this deadlock.
> Changing scheduler behavior will impact many people, so I decided to change chan_iax to fix the problem AND change scheduler to report when this deadlock happens.
> Patch attached, gerrit added too (https://gerrit.asterisk.org/#/c/169/).



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list