<p>George Joseph has uploaded this change for <strong>review</strong>.</p><p><a href="https://gerrit.asterisk.org/c/asterisk/+/19898">View Change</a></p><pre style="font-family: monospace,monospace; white-space: pre-wrap;">res_pjsip: Mask invalid UTF-8 sequences from callerid name<br><br>* Added a new function ast_utf8_mask_invalid_chars() to<br> json.c that copies a string replacing any invalid UTF-8<br> sequences with a specified mask character. For example:<br> "abc\xffdef" becomes "abc?def".<br><br>* Updated res_pjsip:set_id_from_hdr() to use<br> ast_utf8_mask_invalid_chars and print a warning if any<br> invalid sequences were found during the copy.<br><br>* Updated stasis_channels:ast_channel_publish_varset to use<br> ast_utf8_mask_invalid_chars and print a warning if any<br> invalid sequences were found during the copy.<br><br>ASTERISK-27830<br><br>Change-Id: I4ffbdb19c80bf0efc675d40078a3ca4f85c567d8<br>---<br>M include/asterisk/json.h<br>M include/asterisk/utf8.h<br>M main/json.c<br>M main/stasis_channels.c<br>M res/res_pjsip.c<br>M tests/test_json.c<br>6 files changed, 162 insertions(+), 7 deletions(-)<br><br></pre><pre style="font-family: monospace,monospace; white-space: pre-wrap;">git pull ssh://gerrit.asterisk.org:29418/asterisk refs/changes/98/19898/1</pre><pre style="font-family: monospace,monospace; white-space: pre-wrap;"><span>diff --git a/include/asterisk/json.h b/include/asterisk/json.h</span><br><span>index 5edc3a9..5f41e69 100644</span><br><span>--- a/include/asterisk/json.h</span><br><span>+++ b/include/asterisk/json.h</span><br><span>@@ -20,6 +20,7 @@</span><br><span> #define _ASTERISK_JSON_H</span><br><span> </span><br><span> #include "asterisk/netsock2.h"</span><br><span style="color: hsl(120, 100%, 40%);">+#include "utf8.h"</span><br><span> </span><br><span> /*! \file</span><br><span> *</span><br><span>@@ -193,6 +194,26 @@</span><br><span> /*!@{*/</span><br><span> </span><br><span> /*!</span><br><span style="color: hsl(120, 100%, 40%);">+ * \brief Copy a string safely masking any invalid UTF-8 sequences</span><br><span style="color: hsl(120, 100%, 40%);">+ *</span><br><span style="color: hsl(120, 100%, 40%);">+ * This is similar to \ref ast_copy_string, but it will only copy valid UTF-8</span><br><span style="color: hsl(120, 100%, 40%);">+ * sequences from the source string into the destination buffer. Unlike</span><br><span style="color: hsl(120, 100%, 40%);">+ * \ref ast_utf8_copy_string however, if an invalid sequence is encountered,</span><br><span style="color: hsl(120, 100%, 40%);">+ * it's masked with the supplied character and copying continues.</span><br><span style="color: hsl(120, 100%, 40%);">+ *</span><br><span style="color: hsl(120, 100%, 40%);">+ * \param dst The destination buffer.</span><br><span style="color: hsl(120, 100%, 40%);">+ * \param dst_size The size of the dst buffer including space for the NULL terminator</span><br><span style="color: hsl(120, 100%, 40%);">+ * \param src The source string</span><br><span style="color: hsl(120, 100%, 40%);">+ * \param src_len The number of characters to copy</span><br><span style="color: hsl(120, 100%, 40%);">+ * \param mask The charcter to use to mask the invalid ones</span><br><span style="color: hsl(120, 100%, 40%);">+ *</span><br><span style="color: hsl(120, 100%, 40%);">+ * \return The \ref ast_utf8_validation_result indicating whether there</span><br><span style="color: hsl(120, 100%, 40%);">+ * were any invalid characters in the string.</span><br><span style="color: hsl(120, 100%, 40%);">+ */</span><br><span style="color: hsl(120, 100%, 40%);">+enum ast_utf8_validation_result ast_utf8_mask_invalid_chars(char *dst,</span><br><span style="color: hsl(120, 100%, 40%);">+ size_t dst_size, const char *str, size_t src_len, const char mask);</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+/*!</span><br><span> * \brief Check the string of the given length for UTF-8 format.</span><br><span> * \since 13.12.0</span><br><span> *</span><br><span>diff --git a/include/asterisk/utf8.h b/include/asterisk/utf8.h</span><br><span>index 02ec800..7638637 100644</span><br><span>--- a/include/asterisk/utf8.h</span><br><span>+++ b/include/asterisk/utf8.h</span><br><span>@@ -93,6 +93,14 @@</span><br><span> * to feed into the validator the UTF-8 sequence is invalid.</span><br><span> */</span><br><span> AST_UTF8_UNKNOWN,</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+ /*! \brief Outright failure</span><br><span style="color: hsl(120, 100%, 40%);">+ *</span><br><span style="color: hsl(120, 100%, 40%);">+ * Some condition prevented the validator or copy function</span><br><span style="color: hsl(120, 100%, 40%);">+ * from operating all. For instance, it was passed a NULL</span><br><span style="color: hsl(120, 100%, 40%);">+ * pointer or the output buffer was too small.</span><br><span style="color: hsl(120, 100%, 40%);">+ */</span><br><span style="color: hsl(120, 100%, 40%);">+ AST_UTF8_FAIL,</span><br><span> };</span><br><span> </span><br><span> /*!</span><br><span>diff --git a/main/json.c b/main/json.c</span><br><span>index 616b12e..66676a2 100644</span><br><span>--- a/main/json.c</span><br><span>+++ b/main/json.c</span><br><span>@@ -230,6 +230,46 @@</span><br><span> return str ? ast_json_utf8_check_len(str, strlen(str)) : 0;</span><br><span> }</span><br><span> </span><br><span style="color: hsl(120, 100%, 40%);">+enum ast_utf8_validation_result ast_utf8_mask_invalid_chars(char *dst,</span><br><span style="color: hsl(120, 100%, 40%);">+ size_t dst_size, const char *src, size_t src_count, const char mask)</span><br><span style="color: hsl(120, 100%, 40%);">+{</span><br><span style="color: hsl(120, 100%, 40%);">+ size_t pos;</span><br><span style="color: hsl(120, 100%, 40%);">+ size_t count;</span><br><span style="color: hsl(120, 100%, 40%);">+ enum ast_utf8_validation_result result = AST_UTF8_VALID;</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+ if (!src || ! dst || dst_size < src_count + 1) {</span><br><span style="color: hsl(120, 100%, 40%);">+ return AST_UTF8_FAIL;</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+ for (pos = 0; pos < src_count; ) {</span><br><span style="color: hsl(120, 100%, 40%);">+ count = json_utf8_check_first(src[pos]);</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_debug(9, "first pos: %2ld count: %2ld char: '%c' 0x%02x", pos, count, src[pos], src[pos] & 0xFF);</span><br><span style="color: hsl(120, 100%, 40%);">+ if (count == 0) {</span><br><span style="color: hsl(120, 100%, 40%);">+ dst[pos] = mask;</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_debug(9, " not good dst: '%c' 0x%02x\n", dst[pos], dst[pos]);</span><br><span style="color: hsl(120, 100%, 40%);">+ pos++;</span><br><span style="color: hsl(120, 100%, 40%);">+ result = AST_UTF8_INVALID;</span><br><span style="color: hsl(120, 100%, 40%);">+ } else if (count > 1) {</span><br><span style="color: hsl(120, 100%, 40%);">+ if (!json_utf8_check_full(&src[pos], count)) {</span><br><span style="color: hsl(120, 100%, 40%);">+ dst[pos] = mask;</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_debug(9, " fail full dst: '%c' 0x%02x\n", dst[pos], dst[pos]);</span><br><span style="color: hsl(120, 100%, 40%);">+ pos++;</span><br><span style="color: hsl(120, 100%, 40%);">+ result = AST_UTF8_INVALID;</span><br><span style="color: hsl(120, 100%, 40%);">+ } else {</span><br><span style="color: hsl(120, 100%, 40%);">+ strncpy(&dst[pos], &src[pos], count);</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_debug(9, " good seq dst: '%.*s'\n", (int)count, &dst[pos]);</span><br><span style="color: hsl(120, 100%, 40%);">+ pos+=count;</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+ } else {</span><br><span style="color: hsl(120, 100%, 40%);">+ dst[pos] = src[pos];</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_debug(9, " good char dst: '%c' 0x%02x\n", dst[pos], dst[pos]);</span><br><span style="color: hsl(120, 100%, 40%);">+ pos++;</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+ dst[pos] = '\0';</span><br><span style="color: hsl(120, 100%, 40%);">+ return result;</span><br><span style="color: hsl(120, 100%, 40%);">+}</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span> struct ast_json *ast_json_true(void)</span><br><span> {</span><br><span> return (struct ast_json *)json_true();</span><br><span>@@ -637,7 +677,7 @@</span><br><span> "exten", exten,</span><br><span> "priority", priority != -1 ? ast_json_integer_create(priority) : ast_json_null(),</span><br><span> "app_name", app_name,</span><br><span style="color: hsl(0, 100%, 40%);">- "app_data", app_data</span><br><span style="color: hsl(120, 100%, 40%);">+ "app_data", AST_JSON_UTF8_VALIDATE(app_data)</span><br><span> );</span><br><span> }</span><br><span> </span><br><span>diff --git a/main/stasis_channels.c b/main/stasis_channels.c</span><br><span>index d373f6a..696e68f 100644</span><br><span>--- a/main/stasis_channels.c</span><br><span>+++ b/main/stasis_channels.c</span><br><span>@@ -1154,13 +1154,23 @@</span><br><span> void ast_channel_publish_varset(struct ast_channel *chan, const char *name, const char *value)</span><br><span> {</span><br><span> struct ast_json *blob;</span><br><span style="color: hsl(120, 100%, 40%);">+ enum ast_utf8_validation_result result;</span><br><span style="color: hsl(120, 100%, 40%);">+ char *new_value;</span><br><span> </span><br><span> ast_assert(name != NULL);</span><br><span> ast_assert(value != NULL);</span><br><span> </span><br><span style="color: hsl(120, 100%, 40%);">+ new_value = ast_strdupa(value);</span><br><span style="color: hsl(120, 100%, 40%);">+ result = ast_utf8_mask_invalid_chars(new_value, strlen(new_value) + 1,</span><br><span style="color: hsl(120, 100%, 40%);">+ value, strlen(value), '?');</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+ if (result != AST_UTF8_VALID) {</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_log(LOG_WARNING, "%s: Variable '%s' has invalid UTF-8 value '%s'. "</span><br><span style="color: hsl(120, 100%, 40%);">+ " Replacing with '%s'", ast_channel_name(chan), name, value, new_value);</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span> blob = ast_json_pack("{s: s, s: s}",</span><br><span> "variable", name,</span><br><span style="color: hsl(0, 100%, 40%);">- "value", value);</span><br><span style="color: hsl(120, 100%, 40%);">+ "value", new_value);</span><br><span> if (!blob) {</span><br><span> ast_log(LOG_ERROR, "Error creating message\n");</span><br><span> return;</span><br><span>diff --git a/res/res_pjsip.c b/res/res_pjsip.c</span><br><span>index 8273847..7374a4e 100644</span><br><span>--- a/res/res_pjsip.c</span><br><span>+++ b/res/res_pjsip.c</span><br><span>@@ -47,6 +47,7 @@</span><br><span> #include "asterisk/test.h"</span><br><span> #include "asterisk/res_pjsip_presence_xml.h"</span><br><span> #include "asterisk/res_pjproject.h"</span><br><span style="color: hsl(120, 100%, 40%);">+#include "asterisk/json.h"</span><br><span> </span><br><span> /*** MODULEINFO</span><br><span> <depend>pjproject</depend></span><br><span>@@ -2463,10 +2464,9 @@</span><br><span> char cid_num[AST_CHANNEL_NAME];</span><br><span> pjsip_name_addr *id_name_addr = (pjsip_name_addr *) hdr->uri;</span><br><span> char *semi;</span><br><span style="color: hsl(120, 100%, 40%);">+ enum ast_utf8_validation_result result;</span><br><span> </span><br><span style="color: hsl(0, 100%, 40%);">- ast_copy_pj_str(cid_name, &id_name_addr->display, sizeof(cid_name));</span><br><span> ast_copy_pj_str(cid_num, ast_sip_pjsip_uri_get_username(hdr->uri), sizeof(cid_num));</span><br><span style="color: hsl(0, 100%, 40%);">-</span><br><span> /* Always truncate caller-id number at a semicolon. */</span><br><span> semi = strchr(cid_num, ';');</span><br><span> if (semi) {</span><br><span>@@ -2484,6 +2484,16 @@</span><br><span> *semi = '\0';</span><br><span> }</span><br><span> </span><br><span style="color: hsl(120, 100%, 40%);">+ result = ast_utf8_mask_invalid_chars(cid_name, sizeof(cid_name),</span><br><span style="color: hsl(120, 100%, 40%);">+ id_name_addr->display.ptr, id_name_addr->display.slen, '?');</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+ if (result != AST_UTF8_VALID) {</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_log(LOG_WARNING, "CallerID Name '" PJSTR_PRINTF_SPEC "' for number '%s' has invalid UTF-8 characters. "</span><br><span style="color: hsl(120, 100%, 40%);">+ " Replaced with '%s'",</span><br><span style="color: hsl(120, 100%, 40%);">+ PJSTR_PRINTF_VAR(id_name_addr->display), cid_num,</span><br><span style="color: hsl(120, 100%, 40%);">+ cid_name);</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span> ast_free(id->name.str);</span><br><span> id->name.str = ast_strdup(cid_name);</span><br><span> if (!ast_strlen_zero(cid_name)) {</span><br><span>diff --git a/tests/test_json.c b/tests/test_json.c</span><br><span>index e1fc0ba..297bc7c 100644</span><br><span>--- a/tests/test_json.c</span><br><span>+++ b/tests/test_json.c</span><br><span>@@ -1718,14 +1718,14 @@</span><br><span> break;</span><br><span> }</span><br><span> </span><br><span style="color: hsl(0, 100%, 40%);">- expected = ast_json_pack("{s: o, s: o, s: o, s: o, s: o}",</span><br><span style="color: hsl(120, 100%, 40%);">+ expected = ast_json_pack("{s: o, s: o, s: o, s: o, s: s}",</span><br><span> "context", ast_json_null(),</span><br><span> "exten", ast_json_null(),</span><br><span> "priority", ast_json_null(),</span><br><span> "app_name", ast_json_null(),</span><br><span style="color: hsl(0, 100%, 40%);">- "app_data", ast_json_null()</span><br><span style="color: hsl(120, 100%, 40%);">+ "app_data", ""</span><br><span> );</span><br><span style="color: hsl(0, 100%, 40%);">- uut = ast_json_dialplan_cep_app(NULL, NULL, -1, NULL, NULL);</span><br><span style="color: hsl(120, 100%, 40%);">+ uut = ast_json_dialplan_cep_app(NULL, NULL, -1, NULL, "");</span><br><span> ast_test_validate(test, ast_json_equal(expected, uut));</span><br><span> </span><br><span> ast_json_unref(expected);</span><br><span>@@ -1743,6 +1743,46 @@</span><br><span> return AST_TEST_PASS;</span><br><span> }</span><br><span> </span><br><span style="color: hsl(120, 100%, 40%);">+static int test_copy_and_mask(const char *src, const char *cmp)</span><br><span style="color: hsl(120, 100%, 40%);">+{</span><br><span style="color: hsl(120, 100%, 40%);">+ char *dst = ast_strdupa(src);</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_utf8_mask_invalid_chars(dst, strlen(dst) + 1,</span><br><span style="color: hsl(120, 100%, 40%);">+ src, strlen(src), '?');</span><br><span style="color: hsl(120, 100%, 40%);">+ if (strcmp(dst, cmp) != 0) {</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_log(LOG_ERROR, "Invalid result. In: '%s', Out: '%s', Expected: '%s'\n",</span><br><span style="color: hsl(120, 100%, 40%);">+ src, dst, cmp);</span><br><span style="color: hsl(120, 100%, 40%);">+ return 0;</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+ return 1;</span><br><span style="color: hsl(120, 100%, 40%);">+}</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+AST_TEST_DEFINE(test_utf8_mask_invalid_chars)</span><br><span style="color: hsl(120, 100%, 40%);">+{</span><br><span style="color: hsl(120, 100%, 40%);">+ switch (cmd) {</span><br><span style="color: hsl(120, 100%, 40%);">+ case TEST_INIT:</span><br><span style="color: hsl(120, 100%, 40%);">+ info->name = "mask_string";</span><br><span style="color: hsl(120, 100%, 40%);">+ info->category = CATEGORY;</span><br><span style="color: hsl(120, 100%, 40%);">+ info->summary = "Test ast_utf8_mask_invalid_chars";</span><br><span style="color: hsl(120, 100%, 40%);">+ info->description =</span><br><span style="color: hsl(120, 100%, 40%);">+ "Tests UTF-8 string copying/masking code.";</span><br><span style="color: hsl(120, 100%, 40%);">+ return AST_TEST_NOT_RUN;</span><br><span style="color: hsl(120, 100%, 40%);">+ case TEST_EXECUTE:</span><br><span style="color: hsl(120, 100%, 40%);">+ break;</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("Asterisk", "Asterisk"));</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("Asterisk \xc2", "Asterisk ?"));</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("Asterisk \xc2\xae", "Asterisk \xc2\xae"));</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("Asterisk \xc0\x8a", "Asterisk ??"));</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("\xce\xbb xyz", "\xce\xbb xyz"));</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("\xe0\xc2\xb0xyz", "?\xc2\xb0xyz"));</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("\xe0\xc2\xf4\xb0xyz", "????xyz"));</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_test_validate(test, test_copy_and_mask("\xe0\xc2\xb0xyz\xc2", "?\xc2\xb0xyz?"));</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+ return AST_TEST_PASS;</span><br><span style="color: hsl(120, 100%, 40%);">+}</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span style="color: hsl(120, 100%, 40%);">+</span><br><span> static int unload_module(void)</span><br><span> {</span><br><span> AST_TEST_UNREGISTER(json_test_false);</span><br><span>@@ -1797,6 +1837,7 @@</span><br><span> AST_TEST_UNREGISTER(json_test_name_number);</span><br><span> AST_TEST_UNREGISTER(json_test_timeval);</span><br><span> AST_TEST_UNREGISTER(json_test_cep);</span><br><span style="color: hsl(120, 100%, 40%);">+ AST_TEST_UNREGISTER(test_utf8_mask_invalid_chars);</span><br><span> return 0;</span><br><span> }</span><br><span> </span><br><span>@@ -1854,6 +1895,7 @@</span><br><span> AST_TEST_REGISTER(json_test_name_number);</span><br><span> AST_TEST_REGISTER(json_test_timeval);</span><br><span> AST_TEST_REGISTER(json_test_cep);</span><br><span style="color: hsl(120, 100%, 40%);">+ AST_TEST_REGISTER(test_utf8_mask_invalid_chars);</span><br><span> </span><br><span> ast_test_register_init(CATEGORY, json_test_init);</span><br><span> ast_test_register_cleanup(CATEGORY, json_test_cleanup);</span><br><span></span><br></pre><p>To view, visit <a href="https://gerrit.asterisk.org/c/asterisk/+/19898">change 19898</a>. To unsubscribe, or for help writing mail filters, visit <a href="https://gerrit.asterisk.org/settings">settings</a>.</p><div itemscope itemtype="http://schema.org/EmailMessage"><div itemscope itemprop="action" itemtype="http://schema.org/ViewAction"><link itemprop="url" href="https://gerrit.asterisk.org/c/asterisk/+/19898"/><meta itemprop="name" content="View Change"/></div></div>
<div style="display:none"> Gerrit-Project: asterisk </div>
<div style="display:none"> Gerrit-Branch: 18 </div>
<div style="display:none"> Gerrit-Change-Id: I4ffbdb19c80bf0efc675d40078a3ca4f85c567d8 </div>
<div style="display:none"> Gerrit-Change-Number: 19898 </div>
<div style="display:none"> Gerrit-PatchSet: 1 </div>
<div style="display:none"> Gerrit-Owner: George Joseph <gjoseph@digium.com> </div>
<div style="display:none"> Gerrit-MessageType: newchange </div>