[asterisk-dev] res_fax_spandsp segfaults during fax detection

Michal Rybárik michal at rybarik.sk
Mon Jan 27 17:26:42 CST 2014


Hi Pavel,

thank you for an answer - it inspired me a lot, and we're now much 
closer to the resolution (I hope). It seems that there is something 
wrong with memory allocation for RTP frames (probably 
res_rtp_asterisk.c). I explain details below, and I hope that one of 
Asterisk gurus will help us.

First I have to correct something I wrote before. Frame with src=RTP, 
which caused segfault, didn't come from DAHDI, it came from IP network 
(SIP). I verified it also by dropping udp rtp packets on the network - 
RTP frames in V.21 detection function then disappeared too. I'm not 
sure, but it seems that frames from network are stored into memory using 
res_rtp_asterisk.c module (or something very related to it) and probably 
there lives our bug.

You was right when you wrote, that there's something bad with datalen. 
As I know, a-law sample is 13bit integer, stored usually into 16bit 
integer for easier manipulation. We cannot store 13bit integer into 8bit 
integer without loosing information. Also libspandsp is expecting 16bit 
samples for V.21 detection. Asterisk module res_fax_spandsp calls 
spandsp function modem_connect_tones_rx() which is declared as:
    int  modem_connect_tones_rx(modem_connect_tones_rx_state_t *s, const 
int16_t amp[], int len)
where "amp" is array of 16-bit integers (samples), and "len" is number 
of samples (not number of bytes!!!!), as you can see from 
modem_connect_tones_rx() source code. When Asterisk pass "amp" pointer 
to modem_connect_tones_rx() together with "len" = 160, libspandsp will 
read 16-bit integer 160-times, staring from the pointer address, so it 
will read 320 bytes.

Let's look again on ast_frame which caused segfault:
- frametype = 2
- datalen = 160
- samples = 160
- mallocd = 1
- mallocd_hdr_len = 562
- offset = 64
- src = RTP
- flags = 1
- ts = 9140
- len = 20
- seqno = 1489
- data.ptr = 0xb4ef4f30

I am not sure about mallocd_hdr_len and other values, but I think that 
160 bytes space (datalen) is _definitely_ not enough for 160 alaw/slin 
samples.

As I know, segfault happens when application tries to access memory, 
which doesn't belong to it. If we have 160bytes allocated, and we will 
try to read 320bytes from this memory, we'll probably read also 
something else, what we didn't expect. If this memory space is on the 
border of application memory region, we could be trying to read from 
memory which does not belong to this application - and this will cause 
segfault. Definitely.

So now it seems, that problem is not in res_fax_spandsp, neither in 
libspandsp, but somewhere in the Asterisk, where memory for RTP frames 
(coming from IP network) is allocated.

--
Michal Rybarik


On 01/27/2014 06:53 PM, Pavel Troller wrote:
> Hello Michal,
>    I'm afraid that I can't help, but I'm observing exactly the same crashes
> as you are. They are very rare here, about one crash per 2 - 3 millions of
> successful calls (but of course most of the calls are voice ones), so
> debugging is very problematic. However, it's crashing exactly on the same
> place in spandsp code.
>    Please note that not only mallocd_hdr_len is changing, but primarily
> datalen is too! If you subtract those malloc_hdr_len values (722 - 562),
> you will get 160, which is exactly the difference between the datalen
> values. I think that primary cause is the different datalen, and the size
> of allocated memory just reflects this. However, another value, samples,
> is the same in both cases: 160. Isn't it suspicious ? Why I need twice as
> much data length for the same number of samples ? Oh, possibly because they
> are in linear format, thus 16 bits wide (because conversion alaw2lin
> produces 13bit samples), while in the second case they are in some other
> format (see src=alawtolin in the "good" case and src=RTP in the "wrong"
> one). But which one is it ? Native a-law ? Possibly... But it could be
> also u-law ? How the routine gets the actual codec, in which the samples
> are ?
>    So, we digged at least some information about the crash, but in my case,
> my theoretical background of the V.21 detection is almost none, so I can't
> find more. Maybe this is enough for some more skilled person, like the
> res_fax author, to judge, what can be a primary cause of this problem ?
>    With regards,
>      Pavel
>
>> Hello,
>>
>> I have problem with random Asterisk segfaults on the machine, which I use
>> as T.38 gateway between DAHDI and SIP. I would like to kindly ask somebody
>> to take a look at it, and help me to find what's wrong... Asterisk is
>> version 11 from SVN, r382022 (I'm using this because of other dependencies
>> - I compared relevant sources to current v11 SVN and they are almost
>> unchanged).
>>
>> Segfault happens on voice calls, during detection of fax preamble.
>> Segfaults happens randomly - sometimes there is segfault after 50.000
>> calls, sometimes after 5 calls. In coredumps I see, that segfault happens
>> in libspandsp2.so (version 0.06-pre21, and latest snapshot too).
>>
>> I asked Steve Underwood (spandsp author) about this, and he pointed me to
>> the application itself - probably there is something wrong with "amp"
>> (pointer to the audio samples data), because this pointer is first time
>> used in function fsk_rx(), where segfault happens. So I looked deeper into
>> this, and added some debug info into the res_fax_spandsp.c source, into
>> function spandsp_v21_detect(), just before calling modem_connect_tones_rx()
>> (the function, which calls fsk_rx() later). Now I see the contents of frame
>> which caused segfault, and also the "amp" pointer (in asterisk it is
>> f->data.ptr), but I'm not sure what's wrong with it.
>>
>> [Jan 27 14:00:22] VERBOSE[30694][C-000006cb] app_dial.c:     -- Called
>> DAHDI/G2/123456789
>> [Jan 27 14:00:27] VERBOSE[30694][C-000006cb] app_dial.c:     -- DAHDI/57-1
>> is proceeding passing it to SIP/mypbx-00000729
>> [Jan 27 14:00:27] VERBOSE[30694][C-000006cb] app_dial.c:     -- DAHDI/57-1
>> is ringing
>> [Jan 27 14:00:32] VERBOSE[30694][C-000006cb] app_dial.c:     -- DAHDI/57-1
>> answered SIP/mypbx-00000729
>> [Jan 27 14:00:32] NOTICE[30694][C-000006cb] res_fax_spandsp.c: frame={
>> frametype=2, datalen=320, samples=160, mallocd=1, mallocd_hdr_len=722,
>> offset=64, src=alawtolin, flags=0, ts=0, len=0, seqno=0,
>> data.ptr=0xb50c91b8  }
>> [Jan 27 14:00:32] NOTICE[30694][C-000006cb] res_fax_spandsp.c: frame={
>> frametype=2, datalen=160, samples=160, mallocd=1, mallocd_hdr_len=562,
>> offset=64, src=RTP, flags=1, ts=9140, len=20, seqno=1489,
>> data.ptr=0xb4ef4f30  }
>>   (... segfault now ...)
>>
>> Core was generated by `/usr/sbin/asterisk -f -p -U asterisk -vvvg -c'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  fsk_rx (s=0x83ea7e8, amp=0xb4ef4f30, len=160) at fsk.c:381
>> 381                 s->window[j][buf_ptr].re = (ph.re*amp[i])>>
>> s->scaling_shift;
>>
>> Last line from Asterisk log shows contents of ast_frame struct *f, which
>> caused segfault. I see that segfualt was caused by first frame, which
>> arrived from DAHDI (src=RTP) and which was passed to spands_v21_detect(),
>> and then to modem_connect_tones_rx(), and then fsk_rx().
>>
>> Only one unusual thing, which I see on this frame, is that
>> f->mallocd_hdr_len=562. Many other frames have this set to 722 (if
>> f->mallocd==1) or to 0 (if f->mallocd==0). But in a few cases, I saw frames
>> with malloc_hdr_len set to different values, and these frames didn't cause
>> segfault.
>>
>> Is there anybody who can help?
>> Many thanks..
>>
>> --
>> Michal Rybarik
>>
>>
>> -- 
>> _____________________________________________________________________
>> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>>
>> asterisk-dev mailing list
>> To UNSUBSCRIBE or update options visit:
>>    http://lists.digium.com/mailman/listinfo/asterisk-dev




More information about the asterisk-dev mailing list