[Asterisk-code-review] func_odbc: acf_odbc_read() and cli_odbc_read() unicode support (...asterisk[master])

Tue Sep 3 09:52:01 CDT 2019

Alexei Gradinari has posted comments on this change. ( https://gerrit.asterisk.org/c/asterisk/+/12812 )

Change subject: func_odbc:  acf_odbc_read() and cli_odbc_read() unicode support
......................................................................

Patch Set 4:

> Patch Set 4:
> 
> > Patch Set 4:
> > 
> > > Patch Set 4:
> > > 
> > > > Patch Set 3:
> > > > 
> > > > > Patch Set 3:
> > > > > 
> > > > > It's incorrect. SQLColAttribute() with SQL_DESC_DISPLAY_SIZE returns maximum or actual CHARACTER length of a character string instead required byte size. And DISPLAY SIZE for any character types is "the defined (for fixed types) or maximum (for variable types) number of characters needed to display the data in character form".
> > > > > Ex, I use NVARCHAR(30) field, which content text like as 'Подменный сист админ' (20 characters, but 41 bytes). In the case SQLColAttribute() with SQL_DESC_DISPLAY_SIZE returns displaysize as 30 bytes (instead 42 by my patch).
> > > > 
> > > > Can you try SQLColAttribute with SQL_DESC_OCTET_LENGTH?
> > > 
> > > I've faced a problem trying SQLColAttribute() with SQL_DESC_OCTET_LENGTH. Microsoft SQL Server stores unicode characters as UTF-16 (2 bytes per char). But Asterisk, phones, logs, terminals commonly use UTF-8 (1-3 bytes per char) for handling unicode characters. Your way matches only cyrillic characters, but not CJK cases.
> > > 
> > > Ex, I use NVARCHAR(15) field, which content text like as 'いろはにほへとちりぬるを' (12 characters, but 37 bytes in UTF-8). In the case SQLColAttribute() with SQL_DESC_OCTET_LENGTH returns octet length as 30 bytes.
> > > 
> > > So, I suggest to expand a buffer for unicode characters to 3x bytes per char.
> > 
> > I don't think the predefined multiplier is a good idea.
> > The ODBC Driver Manager should resolve this issue.
> > There are 2 encodings: server (driver) and client (application).
> > If the server encoding is UTF16 and client is UTF8 then Driver Manager should convert UTF16 to UTF8 and vice versa. Look at SQL_ATTR_APP_UNICODE_TYPE and SQL_ATTR_DRIVER_UNICODE_TYPE
> 
> Okay, it's theoretically possible to get server and client encodings. But how you offer to convert a buffer length for UTF16 (commonly used server encoding for Windows) to a buffer length for UTF8 (commonly used Asterisk, CLI, SIP-Phones encoding)? AFAIK, there're no algos for it in theory.

I just did a test with MySQL.

CREATE TABLE test_utf8 (
    test_fld varchar(200)
) DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

CREATE TABLE test_utf16 (
    test_fld varchar(200)
) DEFAULT CHARACTER SET utf16
  DEFAULT COLLATE utf16_general_ci;

The SQLDescribeCol returns collen=200 for both tables.
SQLColAttribute with SQL_DESC_OCTET_LENGTH returns
test_utf8  - 600
test_utf16 - 800

-- 
To view, visit https://gerrit.asterisk.org/c/asterisk/+/12812
To unsubscribe, or for help writing mail filters, visit https://gerrit.asterisk.org/settings

Gerrit-Project: asterisk
Gerrit-Branch: master
Gerrit-Change-Id: I50e86c8a277996f13d4a4b9b318ece0d60b279bf
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 4
Gerrit-Owner: Boris P. Korzun <drtr0jan at yandex.ru>
Gerrit-Reviewer: Alexei Gradinari <alex2grad at gmail.com>
Gerrit-Reviewer: Boris P. Korzun <drtr0jan at yandex.ru>
Gerrit-Reviewer: Friendly Automation
Gerrit-Reviewer: George Joseph <gjoseph at digium.com>
Gerrit-Comment-Date: Tue, 03 Sep 2019 14:52:01 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
Gerrit-MessageType: comment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-code-review/attachments/20190903/242aa86d/attachment-0001.html>