[asterisk-bugs] [JIRA] (ASTERISK-27170) segfault in pj_sockaddr_in_set_str_addr

nappsoft (JIRA) noreply at issues.asterisk.org
Thu Aug 17 06:00:08 CDT 2017


    [ https://issues.asterisk.org/jira/browse/ASTERISK-27170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=238113#comment-238113 ] 

nappsoft edited comment on ASTERISK-27170 at 8/17/17 5:59 AM:
--------------------------------------------------------------

Unfortunatelly this does not help. Btw: on the test-system I've managed to change the hangup order for some experiments by hacking a bit in the asterisk sources, but this had no influence (I just delayed the thread that was cleaning up the channel with the extension name in question a bit). Still was not able to let the version compiled with DONT_OPTIMIZE crash inside of gdb or valgrind, I just managed to let it crash outside of debuggers ;) (But this way it's somehow never creating a coredump even though asterisk has been started with -g and ulimit -Hc and -Sc have been set to unlimited, strange, but I'll have a look at that).

One side node (what might have nothing to do with this problem here at all, but just wanted to mention): when starting the software with valgrind I often need to do that 3-4 times as it usually crashes while loading the modules. Not always in the same module but always during the same operation (just to mention: apart from the above discuessed behavior in some rare situations our distribution bundled with asterisk 13.16/17 is running rock solid on hundreds of virtual machines and 11.25.1 was even running for months without a crash on any system at any time, so I doubt that we have a basic memory allocation problem in musl or so):

valgrind output (no matter whether I set a bigger stack size or not):

.
.
.
==11086== Conditional jump or move depends on uninitialised value(s)                  
==11086==    at 0x401ADBC: strlen (in /lib/libc.so)                                   
==11086==    by 0x533F3D: load_modules (loader.c:1355)                                
==11086==    by 0x45BD83: asterisk_daemon (asterisk.c:4692)                           
==11086==    by 0x45B30E: main (asterisk.c:4444)                                      
==11086==                                                                             
==11086== Conditional jump or move depends on uninitialised value(s)                  
==11086==    at 0x401ADBC: strlen (in /lib/libc.so)                                   
==11086==    by 0x531269: find_resource (loader.c:405)                                
==11086==    by 0x53302B: load_resource (loader.c:1040)                               
==11086==    by 0x5334DF: load_resource_list (loader.c:1166)                          
==11086==    by 0x534073: load_modules (loader.c:1376)                                
==11086==    by 0x45BD83: asterisk_daemon (asterisk.c:4692)                           
==11086==    by 0x45B30E: main (asterisk.c:4444) 
==11086==
==11086== Thread 7:                                                   
==11086== Invalid read of size 1                                          
==11086==    at 0x4053151: ??? (in /lib/libc.so)                            
==11086==    by 0x402472F: __copy_tls (in /lib/libc.so)                
==11086==    by 0x7DBFF: ???                                           
==11086==    by 0x7DFFF: ???                                        
==11086==    by 0x10D95FFF: ??? (in /usr/lib/asterisk/modules/res_timing_pthread.so)
==11086==    by 0xFFF: ???                                                   
==11086==    by 0x10E13A3F: ???                                               
==11086==    by 0x4054D5D: pthread_create (in /lib/libc.so)                 
==11086==    by 0x6DB082F: ???                                         
==11086==    by 0x10D96FFF: ???                                     
==11086==    by 0x6DB0737: ???                                         
==11086==    by 0x8E4A8F: ???                                       
==11086==  Address 0x12db6438 is not stack'd, malloc'd or (recently) free'd  
==11086==                                                           
==11086==                                                              
==11086== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==11086==  Access not within mapped region at address 0x12DB6438    
==11086==    at 0x4053151: ??? (in /lib/libc.so)                        
==11086==    by 0x402472F: __copy_tls (in /lib/libc.so)             
==11086==    by 0x7DBFF: ???                                        
==11086==    by 0x7DFFF: ???                                           
==11086==    by 0x10D95FFF: ??? (in /usr/lib/asterisk/modules/res_timing_pthread.so)
==11086==    by 0xFFF: ???                                          
==11086==    by 0x10E13A3F: ???                                               
==11086==    by 0x4054D5D: pthread_create (in /lib/libc.so)           
==11086==    by 0x6DB082F: ???                                               
==11086==    by 0x10D96FFF: ???                                        
==11086==    by 0x6DB0737: ???                                              
==11086==    by 0x8E4A8F: ???                                          
==11086==  If you believe this happened as a result of a stack      
==11086==  overflow in your program's main thread (unlikely but            
==11086==  possible), you can try to increase the size of the       
==11086==  main thread stack using the --main-stacksize= flag.      
==11086==  The main thread stack size used in this run was 50003968.


gdb output:

(gdb) bt full
#0  0x0000000004053151 in memcpy () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#1  0x0000000004024730 in __copy_tls () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#2  0x000000000007dc00 in ?? ()
No symbol table info available.
#3  0x000000000007e000 in ?? ()
No symbol table info available.
#4  0x0000000010d96000 in ?? ()
No symbol table info available.
#5  0x0000000000001000 in ?? ()
No symbol table info available.
#6  0x0000000010e13a40 in ?? ()
No symbol table info available.
#7  0x0000000004054d5e in pthread_create () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#8  0x0000000006db0830 in ?? ()
No symbol table info available.
#9  0x0000000010d97000 in ?? ()
No symbol table info available.
#10 0x0000000006db0738 in ?? ()
No symbol table info available.
#11 0x00000000008e4a90 in consoles ()
No symbol table info available.
#12 0x00000000005f44f2 in ast_inet_ntoa (ia=...) at utils.c:617
        buf = 0x8e4a90 <consoles+16> "\260\212\071\a"
#13 0x00000000079e0860 in ?? ()
No symbol table info available.
#14 0x0000000000000000 in ?? ()
No symbol table info available.



was (Author: nappsoft):
Unfortunatelly this does not help. Btw: on the test-system I've managed to change the hangup order for some experiments by hacking a bit in the asterisk sources, but this had no influence (I just delayed the thread that was cleaning up the channel with the extension name in question a bit). Still was not able to let the version compiled with DONT_OPTIMIZE crash inside of gdb or valgrind, I just managed to let it crash outside of debuggers ;) (But this way it's somehow never creating a coredump even though asterisk has been started with -g and ulimit -Hc and -Sc have been set to unlimited, strange, but I'll have a look at that).

One side node (what might have nothing to do with this problem here at all, but just wanted to mention): when starting the software with valgrind I often need to do that 3-4 times as it usually crashes while loading the modules. Not always in the same module but always during the same operation (just to mention: apart from the above discuessed behavior in some rare situations our distribution bundled with asterisk 13.16/17 is running rock solid on hundreds of virtual machines and 11.25.1 was even running for months without a crash on any system at any time, so I doubt that we have a basic memory allocation problem in musl or so):

valgrind output (no matter whether I set a bigger stack size or not):

==11086== Thread 7:                                                   
==11086== Invalid read of size 1                                          
==11086==    at 0x4053151: ??? (in /lib/libc.so)                            
==11086==    by 0x402472F: __copy_tls (in /lib/libc.so)                
==11086==    by 0x7DBFF: ???                                           
==11086==    by 0x7DFFF: ???                                        
==11086==    by 0x10D95FFF: ??? (in /usr/lib/asterisk/modules/res_timing_pthread.so)
==11086==    by 0xFFF: ???                                                   
==11086==    by 0x10E13A3F: ???                                               
==11086==    by 0x4054D5D: pthread_create (in /lib/libc.so)                 
==11086==    by 0x6DB082F: ???                                         
==11086==    by 0x10D96FFF: ???                                     
==11086==    by 0x6DB0737: ???                                         
==11086==    by 0x8E4A8F: ???                                       
==11086==  Address 0x12db6438 is not stack'd, malloc'd or (recently) free'd  
==11086==                                                           
==11086==                                                              
==11086== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==11086==  Access not within mapped region at address 0x12DB6438    
==11086==    at 0x4053151: ??? (in /lib/libc.so)                        
==11086==    by 0x402472F: __copy_tls (in /lib/libc.so)             
==11086==    by 0x7DBFF: ???                                        
==11086==    by 0x7DFFF: ???                                           
==11086==    by 0x10D95FFF: ??? (in /usr/lib/asterisk/modules/res_timing_pthread.so)
==11086==    by 0xFFF: ???                                          
==11086==    by 0x10E13A3F: ???                                               
==11086==    by 0x4054D5D: pthread_create (in /lib/libc.so)           
==11086==    by 0x6DB082F: ???                                               
==11086==    by 0x10D96FFF: ???                                        
==11086==    by 0x6DB0737: ???                                              
==11086==    by 0x8E4A8F: ???                                          
==11086==  If you believe this happened as a result of a stack      
==11086==  overflow in your program's main thread (unlikely but            
==11086==  possible), you can try to increase the size of the       
==11086==  main thread stack using the --main-stacksize= flag.      
==11086==  The main thread stack size used in this run was 50003968.


gdb output:

(gdb) bt full
#0  0x0000000004053151 in memcpy () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#1  0x0000000004024730 in __copy_tls () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#2  0x000000000007dc00 in ?? ()
No symbol table info available.
#3  0x000000000007e000 in ?? ()
No symbol table info available.
#4  0x0000000010d96000 in ?? ()
No symbol table info available.
#5  0x0000000000001000 in ?? ()
No symbol table info available.
#6  0x0000000010e13a40 in ?? ()
No symbol table info available.
#7  0x0000000004054d5e in pthread_create () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#8  0x0000000006db0830 in ?? ()
No symbol table info available.
#9  0x0000000010d97000 in ?? ()
No symbol table info available.
#10 0x0000000006db0738 in ?? ()
No symbol table info available.
#11 0x00000000008e4a90 in consoles ()
No symbol table info available.
#12 0x00000000005f44f2 in ast_inet_ntoa (ia=...) at utils.c:617
        buf = 0x8e4a90 <consoles+16> "\260\212\071\a"
#13 0x00000000079e0860 in ?? ()
No symbol table info available.
#14 0x0000000000000000 in ?? ()
No symbol table info available.


> segfault in pj_sockaddr_in_set_str_addr
> ---------------------------------------
>
>                 Key: ASTERISK-27170
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27170
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: PBX/General
>    Affects Versions: 13.16.0
>         Environment: 64bit linux musl 1.1.15
>            Reporter: nappsoft
>            Assignee: Unassigned
>         Attachments: crashlog.txt, trace_cel_crash.txt, trace.txt, valgrind2.txt
>
>
> From time to time asterisk crashes in pj_sockaddr_i_set_str_add. The asterisk version we use is 13.16.0 with some stability patches that flew into 13.17.0 (we will update to 13.17.0 soon). But we already had the same crashes with unpatched 13.16.0 versions and with older versions as well.
> According to the sip traces the last thing that happened was a sip transfer. The messageflow was:
> REFER (Phone) -> 202 Accepted (PBX) -> NOTIFY Trying (PBX) -> NOTIFY OK (PBX) -> BYE (Phone) - > OK (PBX for the BYE message) -> OK (Phone for the NOTIFY Trying) -> OK (Phone for the NOTIFY OK)
> As these are embedded systems with limited resources it's always difficult to make crash dumps there or to run asterisk in gdb... I'll try to get some complete backtraces in the future, but maybe somebody has an idea based on the described scenario. => maybe there is a race condition when the Phone sends OK messages for the NOTIFY messages after that the phone has already sent a BYE for the same call?



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list