[asterisk-bugs] [JIRA] Issue Comment Edited: (ASTERISK-20167) UTF-8 cyrillic characters in voicemail email subject cause subject corruption

Tue Jul 24 17:15:21 CDT 2012

    [ https://issues.asterisk.org/jira/browse/ASTERISK-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=195155#comment-195155 ] 

Richard Mudgett edited comment on ASTERISK-20167 at 7/24/12 5:13 PM:
---------------------------------------------------------------------

The transport is default to what Asterisk is using (I assume "sendmail -t"). The code responsible for Q-encoding (including new lines) is in Asterisk, not transport. The call structure goes like this: 

app_voicemail.c -> make_email_file -> ast_str_encode_mime(&str2, 0, ast_str_buffer(str1), strlen("Subject: "), 0)

I suspect there is a bug somewhere in this section:
{code}
    if ((first_section && need_encoding && preamble + ast_str_strlen(tmp) > 70) ||
          (first_section && !need_encoding && preamble + ast_str_strlen(tmp) > 72) ||
          (!first_section && need_encoding && ast_str_strlen(tmp) > 70) ||
          (!first_section && !need_encoding && ast_str_strlen(tmp) > 72)) {
          /* Start new line */
          ast_str_append(end, maxlen, "%s%s?=", first_section ? "" : " ", ast_str_buffer(tmp));
          ast_str_set(&tmp, -1, "=?%s?Q?", charset);
          first_section = 0;
       }
{code}

====

On the side note, wouldn't it be more prudent to use B-encoding (base64) in all cases where multi-byte encoding (UTF-8, UTF-16LE/BE, UTF-32) is requested? The encoding wastage is 4 bytes for every 3 encoded (133%) for Base64 and is 3 bytes for every 1 encoded (300%) when Q-encoding is used. In fact, unless text contains ovewhelming proportion of Latin1 subset that can be represented by a unencoded atom in Q-encoding scheme, it always makes more sense to use Base64.

      was (Author: arcivanov):
    The transport is default to what Asterisk is using (I assume "sendmail -t"). The code responsible for Q-encoding (including new lines) is in Asterisk, not transport. The call structure goes like this: 

app_voicemail.c -> make_email_file -> ast_str_encode_mime(&str2, 0, ast_str_buffer(str1), strlen("Subject: "), 0)

I suspect there is a bug somewhere in this section:

    if ((first_section && need_encoding && preamble + ast_str_strlen(tmp) > 70) ||
          (first_section && !need_encoding && preamble + ast_str_strlen(tmp) > 72) ||
          (!first_section && need_encoding && ast_str_strlen(tmp) > 70) ||
          (!first_section && !need_encoding && ast_str_strlen(tmp) > 72)) {
          /* Start new line */
          ast_str_append(end, maxlen, "%s%s?=", first_section ? "" : " ", ast_str_buffer(tmp));
          ast_str_set(&tmp, -1, "=?%s?Q?", charset);
          first_section = 0;
       }

====

On the side note, wouldn't it be more prudent to use B-encoding (base64) in all cases where multi-byte encoding (UTF-8, UTF-16LE/BE, UTF-32) is requested? The encoding wastage is 4 bytes for every 3 encoded (133%) for Base64 and is 3 bytes for every 1 encoded (300%) when Q-encoding is used. In fact, unless text contains ovewhelming proportion of Latin1 subset that can be represented by a unencoded atom in Q-encoding scheme, it always makes more sense to use Base64.

> UTF-8 cyrillic characters in voicemail email subject cause subject corruption
> -----------------------------------------------------------------------------
>
>                 Key: ASTERISK-20167
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-20167
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_voicemail
>    Affects Versions: 1.8.8.2
>         Environment: Linux myhost.mydomain 2.6.18-308.11.1.el5 #1 SMP Tue Jul 10 08:49:28 EDT 2012 i686 i686 i386 GNU/Linux
> Cent-OS 5.8
>            Reporter: Arcadiy Ivanov
>
> This has been happening ever since 1.4.x.
> ========
> In voicemail.conf:
> emailsubject=[PBX]: Сообщение от ${VM_CALLERID} в ${VM_DATE}
> ========
> The emails arrive with the following subject:
> [PBX]: Сообще�в Monday, July 23, 2012 at 11:45:46 PM
> ========
> The subject should appear as follows:
> [PBX]: Сообщение от "anonymous" <anonymous> в Monday, July 23, 2012 at 11:45:46 PM
> ========
> The raw subject header as it appears in the email message is:
> Subject: =?UTF-8?Q?=5BPBX=5D=3A_=D0=A1=D0=BE=D0=BE=D0=B1=D1=89=D0=B5=D0?=
>  =?UTF-8?Q?=BD=D0=B8=D0=B5_=D0=BE=D1=82_=22anonymous=22_=3Canonymous=3E_?=
>  =?UTF-8?Q?=D0=B2_Monday=2C_July_23=2C_2012_at_11=3A45=3A46_PM?=

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira