[asterisk-bugs] [JIRA] Issue Comment Edited: (ASTERISK-20167) UTF-8 cyrillic characters in voicemail email subject cause subject corruption
Arcadiy Ivanov (JIRA)
noreply at issues.asterisk.org
Tue Jul 24 19:09:21 CDT 2012
[ https://issues.asterisk.org/jira/browse/ASTERISK-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=195155#comment-195155 ]
Arcadiy Ivanov edited comment on ASTERISK-20167 at 7/24/12 7:08 PM:
--------------------------------------------------------------------
The transport is default to what Asterisk is using (I assume "sendmail -t"). The code responsible for Q-encoding (including new lines) is in Asterisk, not transport. The call structure goes like this:
app_voicemail.c -> make_email_file -> ast_str_encode_mime(&str2, 0, ast_str_buffer(str1), strlen("Subject: "), 0)
I suspect there is a bug somewhere in this section:
{code}
if ((first_section && need_encoding && preamble + ast_str_strlen(tmp) > 70) ||
(first_section && !need_encoding && preamble + ast_str_strlen(tmp) > 72) ||
(!first_section && need_encoding && ast_str_strlen(tmp) > 70) ||
(!first_section && !need_encoding && ast_str_strlen(tmp) > 72)) {
/* Start new line */
ast_str_append(end, maxlen, "%s%s?=", first_section ? "" : " ", ast_str_buffer(tmp));
ast_str_set(&tmp, -1, "=?%s?Q?", charset);
first_section = 0;
}
{code}
On the side note, wouldn't it be more prudent to use B-encoding (base64) in all cases where multi-byte encoding (UTF-8, UTF-16LE/BE, UTF-32) is requested? The encoding wastage is 4 bytes for every 3 encoded (133%) for Base64 and is 3 bytes for every 1 encoded (300%) when Q-encoding is used. In fact, unless text contains ovewhelming proportion of Latin1 subset that can be represented by a unencoded atom in Q-encoding scheme, it always makes more sense to use Base64.
was (Author: arcivanov):
The transport is default to what Asterisk is using (I assume "sendmail -t"). The code responsible for Q-encoding (including new lines) is in Asterisk, not transport. The call structure goes like this:
app_voicemail.c -> make_email_file -> ast_str_encode_mime(&str2, 0, ast_str_buffer(str1), strlen("Subject: "), 0)
I suspect there is a bug somewhere in this section:
{code}
if ((first_section && need_encoding && preamble + ast_str_strlen(tmp) > 70) ||
(first_section && !need_encoding && preamble + ast_str_strlen(tmp) > 72) ||
(!first_section && need_encoding && ast_str_strlen(tmp) > 70) ||
(!first_section && !need_encoding && ast_str_strlen(tmp) > 72)) {
/* Start new line */
ast_str_append(end, maxlen, "%s%s?=", first_section ? "" : " ", ast_str_buffer(tmp));
ast_str_set(&tmp, -1, "=?%s?Q?", charset);
first_section = 0;
}
{code}
====
On the side note, wouldn't it be more prudent to use B-encoding (base64) in all cases where multi-byte encoding (UTF-8, UTF-16LE/BE, UTF-32) is requested? The encoding wastage is 4 bytes for every 3 encoded (133%) for Base64 and is 3 bytes for every 1 encoded (300%) when Q-encoding is used. In fact, unless text contains ovewhelming proportion of Latin1 subset that can be represented by a unencoded atom in Q-encoding scheme, it always makes more sense to use Base64.
> UTF-8 cyrillic characters in voicemail email subject cause subject corruption
> -----------------------------------------------------------------------------
>
> Key: ASTERISK-20167
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-20167
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Applications/app_voicemail
> Affects Versions: 1.8.8.2
> Environment: Linux myhost.mydomain 2.6.18-308.11.1.el5 #1 SMP Tue Jul 10 08:49:28 EDT 2012 i686 i686 i386 GNU/Linux
> Cent-OS 5.8
> Reporter: Arcadiy Ivanov
>
> This has been happening ever since 1.4.x.
> ========
> In voicemail.conf:
> emailsubject=[PBX]: Сообщение от ${VM_CALLERID} в ${VM_DATE}
> ========
> The emails arrive with the following subject:
> [PBX]: Сообще�в Monday, July 23, 2012 at 11:45:46 PM
> ========
> The subject should appear as follows:
> [PBX]: Сообщение от "anonymous" <anonymous> в Monday, July 23, 2012 at 11:45:46 PM
> ========
> The raw subject header as it appears in the email message is:
> Subject: =?UTF-8?Q?=5BPBX=5D=3A_=D0=A1=D0=BE=D0=BE=D0=B1=D1=89=D0=B5=D0?=
> =?UTF-8?Q?=BD=D0=B8=D0=B5_=D0=BE=D1=82_=22anonymous=22_=3Canonymous=3E_?=
> =?UTF-8?Q?=D0=B2_Monday=2C_July_23=2C_2012_at_11=3A45=3A46_PM?=
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the asterisk-bugs
mailing list