[asterisk-bugs] [JIRA] (ASTERISK-20167) UTF-8 cyrillic characters in voicemail email subject cause subject corruption

Wed Nov 14 08:17:45 CST 2012

    [ https://issues.asterisk.org/jira/browse/ASTERISK-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=199713#comment-199713 ] 

Arcadiy Ivanov edited comment on ASTERISK-20167 at 11/14/12 8:17 AM:
---------------------------------------------------------------------

But Base64 would be first reassembled and decoded as a whole vs with Q-coding you parse it segment by segment. It makes perfect sense doing Q-code parsing the way you're describing, but for B-code, which is monolithic but cannot exceed 75 chars (w/ encoding specs) per line, that would seem to be simply an invalid approach to parsing.

http://www.ietf.org/rfc/rfc2047.txt

{quote}
8. Examples

   The following are examples of message headers containing 'encoded-
   word's:

   From: =?US-ASCII?Q?Keith_Moore?= <moore at cs.utk.edu>
   To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld at dkuug.dk>
   CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD at vm1.ulg.ac.be>
   Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

      Note: In the first 'encoded-word' of the Subject field above, the
      last "=" at the end of the 'encoded-text' is necessary because each
      'encoded-word' must be self-contained (the "=" character completes a
      group of 4 base64 characters representing 2 octets).  ****An additional
      octet could have been encoded in the first 'encoded-word' (so that
      the encoded-word would contain an exact multiple of 3 encoded
      octets), except that the second 'encoded-word' uses a different
      'charset' than the first one.****
{quote}

See the section with added emphasis - if encoding didn't switch it would've been perfectly valid to carry over a hanging byte.

{quote}
  The 'encoded-text' in an 'encoded-word' must be self-contained;
   'encoded-text' MUST NOT be continued from one 'encoded-word' to
   another.  This implies that the 'encoded-text' portion of a "B"
   'encoded-word' will be a multiple of 4 characters long; for a "Q"
   'encoded-word', any "=" character that appears in the 'encoded-text'
   portion will be followed by two hexadecimal characters.
{quote}

and

{quote}
   Each 'encoded-word' MUST represent an integral number of characters.
   A multi-octet character may not be split across adjacent 'encoded-
   word's.
{quote}

      was (Author: arcivanov):
    But Base64 would be first reassembled and decoded as a whole vs with Q-coding you parse it segment by segment. It makes perfect sense doing Q-code parsing the way you're describing, but for B-code, which is monolithic but cannot exceed 75 chars (w/ encoding specs) per line, that would seem to be simply an invalid approach to parsing.

http://www.ietf.org/rfc/rfc2047.txt

{quote}
8. Examples

   The following are examples of message headers containing 'encoded-
   word's:

   From: =?US-ASCII?Q?Keith_Moore?= <moore at cs.utk.edu>
   To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld at dkuug.dk>
   CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD at vm1.ulg.ac.be>
   Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

      Note: In the first 'encoded-word' of the Subject field above, the
      last "=" at the end of the 'encoded-text' is necessary because each
      'encoded-word' must be self-contained (the "=" character completes a
      group of 4 base64 characters representing 2 octets).  ****An additional
      octet could have been encoded in the first 'encoded-word' (so that
      the encoded-word would contain an exact multiple of 3 encoded
      octets), except that the second 'encoded-word' uses a different
      'charset' than the first one.****
{quote}

See the section with added emphasis - if encoding didn't switch it would've been perfectly valid to carry over a hanging byte.

> UTF-8 cyrillic characters in voicemail email subject cause subject corruption
> -----------------------------------------------------------------------------
>
>                 Key: ASTERISK-20167
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-20167
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_voicemail
>    Affects Versions: 1.8.8.2
>         Environment: Linux myhost.mydomain 2.6.18-308.11.1.el5 #1 SMP Tue Jul 10 08:49:28 EDT 2012 i686 i686 i386 GNU/Linux
> Cent-OS 5.8
>            Reporter: Arcadiy Ivanov
>         Attachments: issueA20167_break_early_for_q_encoding.patch
>
>
> This has been happening ever since 1.4.x.
> ========
> In voicemail.conf:
> emailsubject=[PBX]: Сообщение от ${VM_CALLERID} в ${VM_DATE}
> ========
> The emails arrive with the following subject:
> [PBX]: Сообще�в Monday, July 23, 2012 at 11:45:46 PM
> ========
> The subject should appear as follows:
> [PBX]: Сообщение от "anonymous" <anonymous> в Monday, July 23, 2012 at 11:45:46 PM
> ========
> The raw subject header as it appears in the email message is:
> Subject: =?UTF-8?Q?=5BPBX=5D=3A_=D0=A1=D0=BE=D0=BE=D0=B1=D1=89=D0=B5=D0?=
>  =?UTF-8?Q?=BD=D0=B8=D0=B5_=D0=BE=D1=82_=22anonymous=22_=3Canonymous=3E_?=
>  =?UTF-8?Q?=D0=B2_Monday=2C_July_23=2C_2012_at_11=3A45=3A46_PM?=

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira