[asterisk-bugs] [JIRA] (ASTERISK-28695) minmemfree watermark calculated incorrectly
Kevin Flyn (JIRA)
noreply at issues.asterisk.org
Wed Jan 15 10:49:25 CST 2020
Kevin Flyn created ASTERISK-28695:
-------------------------------------
Summary: minmemfree watermark calculated incorrectly
Key: ASTERISK-28695
URL: https://issues.asterisk.org/jira/browse/ASTERISK-28695
Project: Asterisk
Issue Type: Bug
Security Level: None
Components: PBX/General
Affects Versions: 16.3.0
Environment: Manjaro Linux 4.19.28-1
Reporter: Kevin Flyn
Severity: Minor
My asterisk system stopped accepting incomming and outgoing calls, and was delivering a "fast busy" tone to my home phone the other day, so I fired up the CLI and attempted an outgoing call with SIP debugging enabled and got the following output:
{quote}
...
[2020-01-15 10:33:12.524] DEBUG[111414][C-00000003]: chan_sip.c:3801 __sip_xmit: Trying to put 'SIP/2.0 100' onto UDP socket destined for XXX.XXX.X.XX:5060
[2020-01-15 10:33:12.524] WARNING[111414][C-00000003]: pbx.c:4623 increase_call_count: Available system memory (~423MB) is below the configured low watermark (500MB)
[2020-01-15 10:33:12.525] DEBUG[111388]: chan_sip.c:30608 sip_devicestate: Checking device state for peer grandstream1
[2020-01-15 10:33:12.525] DEBUG[111414][C-00000003]: chan_sip.c:3457 sip_alreadygone: Setting SIP_ALREADYGONE on dialog 581189762-5060-5 at BJC.BGI.F.ED
[2020-01-15 10:33:12.525] WARNING[111414][C-00000003]: chan_sip.c:26866 handle_request_invite: Failed to start PBX (call limit reached)
[2020-01-15 10:33:12.525] DEBUG[111388]: devicestate.c:466 do_state_change: Changing state for SIP/grandstream1 - state 2 (In use)
...
{quote}
As you can see in the output, asterisk is telling me that my system is low on memory and has hit the "minmemfree" watermark that I had set in the asterisk.conf config file at 500MB. Knowing my server is minimally loaded, I logged in via ssh and executed the "free" command to investigate RAM status:
{quote}
...
[hpz230]# free
total used free shared buff/cache available
Mem: 15792 786 424 8 14581 14668
Swap: 8192 0 8192
...
{quote}
The amount of "free" memory is indeed below 500MB, but only because my system has been up for 120+ days, and the memory is being used by the kernel as a filesystem block cache. When one includes the fact that the ram being used as a block cache is available for application use without triggering the kernel OOM process killer, it seems to me that asterisk is calculating the amount of free ram for this setting incorrectly.
I issued a quick "sync" to sync all filesystem data to disk, then cleared all of
the linux fs cache data and ran free again, and the output is below:
{quote}
...
[hpz230]# sync
[hpz230]# echo "3" > /proc/sys/vm/drop_caches
[hpz230]# free
total used free shared buff/cache available
Mem: 15792 785 14605 1 401 14736
Swap: 8192 0 8192
...
{quote}
Now the "free" ram is 14GB+ and sure enough, asterisk happily started accepting phone calls again.
The following code snippet from github is where asterisk makes this decision:
https://github.com/asterisk/asterisk/blob/391aafb97172e3beb9a779458456d2e75ecf4610/main/pbx.c
{quote}
#if defined(HAVE_SYSINFO)
if (option_minmemfree) {
if (!sysinfo(&sys_info)) {
/* make sure that the free system memory is above the configured low watermark
* convert the amount of freeram from mem_units to MB */
curfreemem = sys_info.freeram * sys_info.mem_unit;
curfreemem /= 1024 * 1024;
if (curfreemem < option_minmemfree) {
ast_log(LOG_WARNING, "Available system memory (~%ldMB) is below the configured low watermark (%ldMB)\n", curfreemem, option_minmemfree);
failed = -1;
}
}
}
#endif
{quote}
As you can see, asterisk is calculating the watermark value based on the sysinfo() function results, using the fields "freeram" and "mem_unit" without regard to the amount of memory being used as a filesystem cache.
A quick look at the manpage for sysinfo() shows that sysinfo() also returns a
"bufferram" field which is the "Memory used by buffers" so it stands to reason that the above line of code that calculates the amount of free ram should read:
bq. curfreemem = (sys_info.freeram + sys_info.bufferram) * sys_info.mem_unit;
Any system using the current code base that remains powered up for an extended period of time will eventually run into this issue unless it periodically clears the kernel filesystem cache buffers.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list