[asterisk-bugs] [JIRA] (ASTERISK-30500) Caller name corruption in encodings other than UTF-8

Basil Mi (JIRA) noreply at issues.asterisk.org
Tue Apr 25 12:59:03 CDT 2023


    [ https://issues.asterisk.org/jira/browse/ASTERISK-30500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=261837#comment-261837 ] 

Basil Mi commented on ASTERISK-30500:
-------------------------------------

It might not be the best idea to force/mandatory change characters in the callername.
It may be with the correct characters, but not in the correct encoding (not UTF-8).
Some thoughts:
Use "ast_utf8_replace_invalid_chars" only if any of the common/known encodings are not found in the string. If the callername is in the valid encodings (win-1251, koi-8 and so one), do not correct it and leave further conversion to the user. 
Or briefly: if we found that the callername is in valid win-1251, then do not call "ast_utf8_replace_invalid_chars". :-)

When there is a call to the "broken device", we use the inverse transformation UTF-8->WIN-1251: {code} Set(CALLERID(name)=${ICONV(UTF-8,WINDOWS-1251,$[CALLERID(name)])});{code}

Tiсket to Panasonic Corp.?  :-) I think they will solve the problem for years. This is a large family of hardware PBXs and proprietary telephone sets for them (for example https://www.kx-td.com/telephone-systems/).


> Caller name corruption in encodings other than UTF-8
> ----------------------------------------------------
>
>                 Key: ASTERISK-30500
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-30500
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_pjsip
>    Affects Versions: 18.17.0
>         Environment: FreeBSD 13.2
>            Reporter: Basil Mi
>            Assignee: Unassigned
>            Severity: Major
>         Attachments: sip-capture.txt, win-1251_text_example_1.txt
>
>
> After this change:   ASTERISK-27830
> ===================================
> {quote}
> 2023-02-16 10:05 +0000 [1ddfb7551a]  George Joseph <gjoseph at sangoma.com>
>         * res_pjsip: Replace invalid UTF-8 sequences in callerid name
>            * Added a new function ast_utf8_replace_invalid_chars() to
>             utf8.c that copies a string replacing any invalid UTF-8
>             sequences with the Unicode specified U+FFFD replacement
>             character.  For example:  "abc\xffdef" becomes "abc\uFFFDdef".
>             Any UTF-8 compliant implementation will show that character
>             as a � character.
>            * Updated res_pjsip:set_id_from_hdr() to use
>             ast_utf8_replace_invalid_chars and print a warning if any
>             invalid sequences were found during the copy.
>            * Updated stasis_channels:ast_channel_publish_varset to use
>             ast_utf8_replace_invalid_chars and print a warning if any
>             invalid sequences were found during the copy.
>            ASTERISK-27830
> {quote}
> ===================================
> Some legacy devices transmit the caller name in encodings other than UTF-8. For example, PBX Panasonic KX-TDE600 uses WINDOWS-1251 and it's not configurable.
> In this case we use a function ICONV in dialplan to convert caller name to UTF-8 (for incoming calls). And vice versa (for outcoming calls).
> Using new function {color:red} "ast_utf8_replace_invalid_chars" {color} distorts caller name to "�" characters before it can be converted to UTF-8 in dialplan.
> Users see “�������” on devices instead of valid the caller's name.
> This logic worked for almost 10 years and broke on 18.17.0.
> Need to be able to turn off the replacement of invalid UTF-8 sequences (f.e. from config). Or  be able to use the ICONV before replacement (before call {color:red} "ast_utf8_replace_invalid_chars" {color}).



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list