[asterisk-bugs] [JIRA] (ASTERISK-27835) Asterisk Memory consumption after load

Joshua Colp (JIRA) noreply at issues.asterisk.org
Tue Jun 19 04:22:55 CDT 2018


     [ https://issues.asterisk.org/jira/browse/ASTERISK-27835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joshua Colp updated ASTERISK-27835:
-----------------------------------

    Assignee: Biz Biz  (was: Unassigned)
      Status: Waiting for Feedback  (was: Triage)

Per the comment from Richard we really do need to see MALLOC_DEBUG information of a failing instance to see what is using all of the memory so we can narrow it down. Failing that we would need further information about precise usage of the system to see if we could narrow it down that way.

> Asterisk Memory consumption after load
> --------------------------------------
>
>                 Key: ASTERISK-27835
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27835
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/General
>    Affects Versions: 13.17.0
>         Environment: Virtual Machine 8 CPU, 16 Gb RAM on VM Ware ESXi6
> OS: RedHat 7.1
> Asterisk : 13.17.0
>            Reporter: Biz Biz
>            Assignee: Biz Biz
>         Attachments: AsteriskMemoryConsumption.png, asteriskmemorysummary.zip
>
>
> 70 SIP user registered
> 120 Active SIP calls (only SIP), the calls remain active all the time 
> 300 Active internal calls between conferences
> 110 Active Conferences (meetme)
> Sending message on AMI Interface (hundreds/minute) (mainly mute/unmute of channels between conferences, Originate to make calls)
> Asking channel status by command : asterisk -rx 'sip show channels' and 'meetme list <conference>' (several times minute)
> Asterisk modules:
> {noformat}
> [modules]
> autoload=yes
> noload => chan_alsa.so
> noload => chan_oss.so
> noload => chan_console.so
> noload => res_stun_monitor.so
> noload => res_ari_applications.so
> noload => res_ari_bridges.so
> noload => res_ari_device_states.so
> noload => res_ari_events.so
> noload => res_ari_playbacks.so
> noload => res_ari.so
> noload => res_ari_asterisk.so
> noload => res_ari_channels.so
> noload => res_ari_endpoints.so
> noload => res_ari_model.so
> noload => res_ari_recordings.so
> noload => res_ari_sounds.so
> noload => app_stasis.so
> noload => res_stasis_answer.so
> noload => res_stasis_device_state.so
> noload => res_stasis_playback.so
> noload => res_stasis_recording.so
> noload => res_stasis_snoop.so
> noload => res_stasis.so
> noload => func_sorcery.so
> noload => res_sorcery_astdb.so
> noload => res_sorcery_config.so
> noload => res_sorcery_memory_cache.so
> noload => res_sorcery_memory.so
> noload => res_sorcery_realtime.so
> noload => res_hep_rtcp.so
> noload => res_hep.so
> noload => app_mysql.so
> noload => cdr_mysql.so
> noload => cdr_csv.so
> noload => cdr_custom.so
> noload => cdr_mysql.so
> noload => cdr_odbc.so
> noload => cdr_sqlite3_custom.so
> noload => cdr_syslog.so
> noload => app_test.so
> noload => res_config_mysql.so
> noload => chan_iax2.so
> noload => app_voicemail.so
> noload => chan_dahdi.so
> noload => res_config_ldap.so
> noload => res_http_websocket.so
> {noformat}
> After 6/7 days of work the memory used  start growing till to saturate  all memory system (RAM and swap) in half hour/one hour ;the growth
> it isn't linear in the time (i've monitored the process with pmap),it start increasing very fast after few days.
> asterisk pmap out extract:
> {noformat}
> gio 26 apr 2018, 16.50.47, CEST total kB 5007200 106968 94372
> gio 26 apr 2018, 17.00.47, CEST total kB 5007200 107072 94476
> gio 26 apr 2018, 17.10.48, CEST total kB 5007200 107148 94552
> gio 26 apr 2018, 17.20.48, CEST total kB 5007200 107112 94516
> gio 26 apr 2018, 17.30.48, CEST total kB 5011664 107812 95116
> gio 26 apr 2018, 17.40.48, CEST total kB 5011664 107928 95232
> gio 26 apr 2018, 17.50.49, CEST total kB 5011664 108272 95576
> gio 26 apr 2018, 18.00.49, CEST total kB 5011664 108556 95860
> gio 26 apr 2018, 18.10.50, CEST total kB 5011664 108956 96260
> dom 29 apr 2018, 07.52.54, CEST total kB 5011664 111332 98612
> dom 29 apr 2018, 08.02.55, CEST total kB 5011664 111320 98600
> dom 29 apr 2018, 08.12.55, CEST total kB 5011664 111320 98600
> dom 29 apr 2018, 08.22.55, CEST total kB 5011664 111320 98600
> dom 29 apr 2018, 08.32.56, CEST total kB 5011664 111320 98600
> mer 2 mag 2018, 20.15.35, CEST total kB 5171376 141488 128768
> mer 2 mag 2018, 20.25.35, CEST total kB 5171376 141488 128768
> mer 2 mag 2018, 20.35.36, CEST total kB 5171376 141484 128764
> mer 2 mag 2018, 20.45.36, CEST total kB 5171376 141592 128872
> mer 2 mag 2018, 20.55.37, CEST total kB 5171376 141484 128764
> mer 2 mag 2018, 21.05.37, CEST total kB 5171376 141484 128764
> mer 2 mag 2018, 21.15.38, CEST total kB 5171376 144160 131440
> mer 2 mag 2018, 21.25.38, CEST total kB 7595844 3186200 3173180
> mer 2 mag 2018, 21.35.46, CEST total kB 9825056 5620656 5607628
> mer 2 mag 2018, 21.45.59, CEST total kB 11109020 7040680 7027652
> mer 2 mag 2018, 21.56.17, CEST total kB 11649240 7890956 7877928
> mer 2 mag 2018, 22.06.36, CEST total kB 12237640 8642528 8629500
> mer 2 mag 2018, 22.16.56, CEST total kB 13152596 9343332 9330304
> mer 2 mag 2018, 22.27.17, CEST total kB 13677312 10007388 9994360
> mer 2 mag 2018, 22.37.39, CEST total kB 14136988 10616112 10603084
> mer 2 mag 2018, 22.48.03, CEST total kB 14924344 11220692 11207664
> {noformat}
> From here asterisk is unresponsive.
> I did a lot of tests (using asterisk 13.20 too) but the behaviour is always the same.
> The epilogue  differs:
> Some times the operating system kills Asterisk  for the excessive memory consumption:
> {noformat}
> apr 11 22:18:36 hermes kernel: Out of memory: Kill process 3586 (asterisk) score 885 or sacrifice child
> apr 11 22:18:36 hermes kernel: Killed process 3586 (asterisk) total-vm:24097556kB, anon-rss:15277360kB, file-rss:0kB, shmem-rss:0kB
> {noformat}
> after receiving some errors on log (messages)
> {noformat}
> [Apr 11 22:18:04] NOTICE[10758] chan_sip.c:    -- Registration for 'hsgw-trunk0 at svr-trunk0' timed out, trying again (Attempt #3)
> [Apr 11 22:18:11] ERROR[11426][C-00000021] astobj2.c: Excessive refcount 100000 reached on ao2 object 0x7f7b94001cb8
> [Apr 11 22:18:11] ERROR[11426][C-00000021] astobj2.c: FRACK!, Failed assertion Excessive refcount 100000 reached on ao2 object 0x7f7b94001cb8 (0)
> [Apr 11 22:18:12] ERROR[11426][C-00000021] astobj2.c: Excessive refcount 100000 reached on ao2 object 0x7f7b94001cb8
> [Apr 11 22:18:12] ERROR[11426][C-00000021] astobj2.c: FRACK!, Failed assertion Excessive refcount 100000 reached on ao2 object 0x7f7b94001cb8 (0)
> [Apr 11 22:18:14] ERROR[11497][C-0000003b] astobj2.c: Excessive refcount 100000 reached on ao2 object 0x7f7c24004848
> [Apr 11 22:18:14] ERROR[11497][C-0000003b] astobj2.c: FRACK!, Failed assertion Excessive refcount 100000 reached on ao2 object 0x7f7c24004848 (0)
> {noformat}
> Recompiling asterisk with debug flags to trace memory (as requested in guide to asterisk issue):
> {noformat}
>  [*] DONT_OPTIMIZE
>  [*] COMPILE_DOUBLE
>  [ ] DEBUG_THREADS
>  [ ] DEBUG_FD_LEAKSthe final 
>  [*] BETTER_BACKTRACES
>  [ ] LOTS_OF_SPANS
>  [*] MALLOC_DEBUG
>  ( ) DEBUG_CHAOS
>  [*] BUILD_NATIVE
>      --- Extended ---
>  [ ] REF_DEBUG
>  [ ] AO2_DEBUG
>  [ ] STATIC_BUILD
>  [ ] REBUILD_PARSERS
>  [ ] LOW_MEMORY
>  [ ] DISABLE_INLINE
>  [*] OPTIONAL_API
>  XXX USE_HOARD_ALLOCATOR
>  [ ] RADIO_RELAX
>  [ ] G711_NEW_ALGORITHM
>  < > G711_REDUCED_BRANCHING
>  < > TEST_CODING_TABLES
>  < > TEST_TANDEM_TRANSCODING
>  ( ) ADDRESS_SANITIZER
>  [ ] THREAD_SANITIZER
>  XXX LEAK_SANITIZER
>  XXX UNDEFINED_SANITIZER
>  [ ] BUSYDETECT_TONEONLY
>  [ ] BUSYDETECT_COMPARE_TONE_AND_SILENC
>  [ ] BUSYDETECT_DEBUG
>  [ ] INTEGER_CALLERID
> {noformat}
> and repeating the test 
> after 6 days asterisk has the same behaviour (in few time (1 hour) memory starts to grow to the limit), but this time asterisk remain freezed and with 14 Gb of memory allocated and totally unresponsive: 
> {noformat}
> [root at hermes asterisk-13.17.0]# ps aux | grep asterisk
> root      4925  0.0  0.0 115252   276 ?        S    apr26   0:00 /bin/sh /usr/sbin/safe_asterisk
> root      4927  142 80.2 17531648 13095380 ?   Sl   apr26 13878:51 /usr/sbin/asterisk -f -vvvg -c
> {noformat}
> the mmlog trace nothing:
> {noformat}
> [root at hermes asterisk]# cat mmlog
> 1523538186 - New session
> 1523601658 - New session
> 1523627081 - New session
> 1523814040 - New session
> 1524740535 - New session
> 1524740807 - New session
> 1524740809 - New session
> 1524741524 - New session
> 1524745950 - New session
> 1524746405 - New session
> 1524746417 - New session
> 1524746418 - New session
> 1524746493 - New session
> 1524746516 - New session
> 1524747598 - New session
> 1524751289 - New session
> 1524751337 - New session
> {noformat}
> On message there are a lot of warning of type :
> {noformat}
> [May  3 07:42:46] WARNING[6032] asterisk.c: No more connections allowed
> [May  3 07:42:46] WARNING[24523][C-00000401] channel.c: Exceptionally long voice queue length queuing to Local/9283 at cnf_patch_not_radioonly-00000168;2
> [May  3 07:42:46] WARNING[29799][C-00000043] channel.c: Exceptionally long voice queue length queuing to Local/9280 at cnf_patch_not_radioonly-0000001d;2
> [May  3 07:42:46] WARNING[19315][C-0000034f] channel.c: Exceptionally long voice queue length queuing to Local/9280 at cnf_patch_not_radioonly-0000011c;2
> [May  3 07:42:46] WARNING[1481][C-00000065] channel.c: Exceptionally long voice queue length queuing to Local/9280 at cnf_patch_not_radioonly-0000002c;2
> [May  3 07:42:46] WARNING[3161][C-0000006c] channel.c: Exceptionally long voice queue length queuing to Local/9281 at cnf_patch_not_radioonly-0000002f;1
> [May  3 07:42:46] WARNING[1372][C-00000062] channel.c: Exceptionally long voice queue length queuing to Local/9282 at cnf_patch_not_radioonly-0000002b;1
> [May  3 07:42:46] WARNING[21649][C-0000042a] channel.c: Exceptionally long voice queue length queuing to Local/9282 at cnf_patch_not_radioonly-0000017a;1
> [May  3 07:42:46] WARNING[9594][C-00000393] channel.c: Exceptionally long voice queue length queuing to Local/9280 at cnf_patch_not_radioonly-00000139;2
> [May  3 07:42:46] WARNING[6032] asterisk.c: No more connections allowed
> [May  3 07:42:46] VERBOSE[24547] manager.c: Manager 'astlog' logged on from 192.168.3.230
> [May  3 07:42:46] WARNING[6032] asterisk.c: No more connections allowed
> [May  3 07:42:46] WARNING[7810][C-000003db] channel.c: Exceptionally long voice queue length queuing to Local/9281 at cnf_patch_not_radioonly-00000158;2
> [May  3 07:42:46] WARNING[6032] asterisk.c: No more connections allowed
> [May  3 07:43:36] WARNING[6032] asterisk.c: No more connections allowed
> [May  3 07:43:36] WARNING[6032] asterisk.c: No more connections allowed
> [May  3 07:43:36] WARNING[12603][C-00000465] channel.c: Exceptionally long voice queue length queuing to Local/9283 at cnf_patch_not_radioonly-00000193;2
> [May  3 07:43:36] WARNING[23367][C-0000035c] channel.c: Exceptionally long voice queue length queuing to Local/9281 at cnf_patch_not_radioonly-00000122;1
> [May  3 07:43:36] WARNING[2601][C-00000335] channel.c: Exceptionally long voice queue length queuing to Local/9283 at cnf_patch_not_radioonly-00000111;2
> {noformat}
> The problem is the start of memory growing after few days.
> I saw that other people had the same problem, but i've found no definitive solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list