[asterisk-dev] res_fax_spandsp segfaults during fax detection
Michal Rybárik
michal at rybarik.sk
Mon Jan 27 17:26:42 CST 2014
Hi Pavel,
thank you for an answer - it inspired me a lot, and we're now much
closer to the resolution (I hope). It seems that there is something
wrong with memory allocation for RTP frames (probably
res_rtp_asterisk.c). I explain details below, and I hope that one of
Asterisk gurus will help us.
First I have to correct something I wrote before. Frame with src=RTP,
which caused segfault, didn't come from DAHDI, it came from IP network
(SIP). I verified it also by dropping udp rtp packets on the network -
RTP frames in V.21 detection function then disappeared too. I'm not
sure, but it seems that frames from network are stored into memory using
res_rtp_asterisk.c module (or something very related to it) and probably
there lives our bug.
You was right when you wrote, that there's something bad with datalen.
As I know, a-law sample is 13bit integer, stored usually into 16bit
integer for easier manipulation. We cannot store 13bit integer into 8bit
integer without loosing information. Also libspandsp is expecting 16bit
samples for V.21 detection. Asterisk module res_fax_spandsp calls
spandsp function modem_connect_tones_rx() which is declared as:
int modem_connect_tones_rx(modem_connect_tones_rx_state_t *s, const
int16_t amp[], int len)
where "amp" is array of 16-bit integers (samples), and "len" is number
of samples (not number of bytes!!!!), as you can see from
modem_connect_tones_rx() source code. When Asterisk pass "amp" pointer
to modem_connect_tones_rx() together with "len" = 160, libspandsp will
read 16-bit integer 160-times, staring from the pointer address, so it
will read 320 bytes.
Let's look again on ast_frame which caused segfault:
- frametype = 2
- datalen = 160
- samples = 160
- mallocd = 1
- mallocd_hdr_len = 562
- offset = 64
- src = RTP
- flags = 1
- ts = 9140
- len = 20
- seqno = 1489
- data.ptr = 0xb4ef4f30
I am not sure about mallocd_hdr_len and other values, but I think that
160 bytes space (datalen) is _definitely_ not enough for 160 alaw/slin
samples.
As I know, segfault happens when application tries to access memory,
which doesn't belong to it. If we have 160bytes allocated, and we will
try to read 320bytes from this memory, we'll probably read also
something else, what we didn't expect. If this memory space is on the
border of application memory region, we could be trying to read from
memory which does not belong to this application - and this will cause
segfault. Definitely.
So now it seems, that problem is not in res_fax_spandsp, neither in
libspandsp, but somewhere in the Asterisk, where memory for RTP frames
(coming from IP network) is allocated.
--
Michal Rybarik
On 01/27/2014 06:53 PM, Pavel Troller wrote:
> Hello Michal,
> I'm afraid that I can't help, but I'm observing exactly the same crashes
> as you are. They are very rare here, about one crash per 2 - 3 millions of
> successful calls (but of course most of the calls are voice ones), so
> debugging is very problematic. However, it's crashing exactly on the same
> place in spandsp code.
> Please note that not only mallocd_hdr_len is changing, but primarily
> datalen is too! If you subtract those malloc_hdr_len values (722 - 562),
> you will get 160, which is exactly the difference between the datalen
> values. I think that primary cause is the different datalen, and the size
> of allocated memory just reflects this. However, another value, samples,
> is the same in both cases: 160. Isn't it suspicious ? Why I need twice as
> much data length for the same number of samples ? Oh, possibly because they
> are in linear format, thus 16 bits wide (because conversion alaw2lin
> produces 13bit samples), while in the second case they are in some other
> format (see src=alawtolin in the "good" case and src=RTP in the "wrong"
> one). But which one is it ? Native a-law ? Possibly... But it could be
> also u-law ? How the routine gets the actual codec, in which the samples
> are ?
> So, we digged at least some information about the crash, but in my case,
> my theoretical background of the V.21 detection is almost none, so I can't
> find more. Maybe this is enough for some more skilled person, like the
> res_fax author, to judge, what can be a primary cause of this problem ?
> With regards,
> Pavel
>
>> Hello,
>>
>> I have problem with random Asterisk segfaults on the machine, which I use
>> as T.38 gateway between DAHDI and SIP. I would like to kindly ask somebody
>> to take a look at it, and help me to find what's wrong... Asterisk is
>> version 11 from SVN, r382022 (I'm using this because of other dependencies
>> - I compared relevant sources to current v11 SVN and they are almost
>> unchanged).
>>
>> Segfault happens on voice calls, during detection of fax preamble.
>> Segfaults happens randomly - sometimes there is segfault after 50.000
>> calls, sometimes after 5 calls. In coredumps I see, that segfault happens
>> in libspandsp2.so (version 0.06-pre21, and latest snapshot too).
>>
>> I asked Steve Underwood (spandsp author) about this, and he pointed me to
>> the application itself - probably there is something wrong with "amp"
>> (pointer to the audio samples data), because this pointer is first time
>> used in function fsk_rx(), where segfault happens. So I looked deeper into
>> this, and added some debug info into the res_fax_spandsp.c source, into
>> function spandsp_v21_detect(), just before calling modem_connect_tones_rx()
>> (the function, which calls fsk_rx() later). Now I see the contents of frame
>> which caused segfault, and also the "amp" pointer (in asterisk it is
>> f->data.ptr), but I'm not sure what's wrong with it.
>>
>> [Jan 27 14:00:22] VERBOSE[30694][C-000006cb] app_dial.c: -- Called
>> DAHDI/G2/123456789
>> [Jan 27 14:00:27] VERBOSE[30694][C-000006cb] app_dial.c: -- DAHDI/57-1
>> is proceeding passing it to SIP/mypbx-00000729
>> [Jan 27 14:00:27] VERBOSE[30694][C-000006cb] app_dial.c: -- DAHDI/57-1
>> is ringing
>> [Jan 27 14:00:32] VERBOSE[30694][C-000006cb] app_dial.c: -- DAHDI/57-1
>> answered SIP/mypbx-00000729
>> [Jan 27 14:00:32] NOTICE[30694][C-000006cb] res_fax_spandsp.c: frame={
>> frametype=2, datalen=320, samples=160, mallocd=1, mallocd_hdr_len=722,
>> offset=64, src=alawtolin, flags=0, ts=0, len=0, seqno=0,
>> data.ptr=0xb50c91b8 }
>> [Jan 27 14:00:32] NOTICE[30694][C-000006cb] res_fax_spandsp.c: frame={
>> frametype=2, datalen=160, samples=160, mallocd=1, mallocd_hdr_len=562,
>> offset=64, src=RTP, flags=1, ts=9140, len=20, seqno=1489,
>> data.ptr=0xb4ef4f30 }
>> (... segfault now ...)
>>
>> Core was generated by `/usr/sbin/asterisk -f -p -U asterisk -vvvg -c'.
>> Program terminated with signal 11, Segmentation fault.
>> #0 fsk_rx (s=0x83ea7e8, amp=0xb4ef4f30, len=160) at fsk.c:381
>> 381 s->window[j][buf_ptr].re = (ph.re*amp[i])>>
>> s->scaling_shift;
>>
>> Last line from Asterisk log shows contents of ast_frame struct *f, which
>> caused segfault. I see that segfualt was caused by first frame, which
>> arrived from DAHDI (src=RTP) and which was passed to spands_v21_detect(),
>> and then to modem_connect_tones_rx(), and then fsk_rx().
>>
>> Only one unusual thing, which I see on this frame, is that
>> f->mallocd_hdr_len=562. Many other frames have this set to 722 (if
>> f->mallocd==1) or to 0 (if f->mallocd==0). But in a few cases, I saw frames
>> with malloc_hdr_len set to different values, and these frames didn't cause
>> segfault.
>>
>> Is there anybody who can help?
>> Many thanks..
>>
>> --
>> Michal Rybarik
>>
>>
>> --
>> _____________________________________________________________________
>> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>>
>> asterisk-dev mailing list
>> To UNSUBSCRIBE or update options visit:
>> http://lists.digium.com/mailman/listinfo/asterisk-dev
More information about the asterisk-dev
mailing list