Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string comparison ignores NUL character by default #7851

Open
jycor opened this issue May 13, 2024 · 0 comments
Open

string comparison ignores NUL character by default #7851

jycor opened this issue May 13, 2024 · 0 comments
Labels
correctness We don't return the same result as MySQL good repro Easily reproducible bugs

Comments

@jycor
Copy link
Contributor

jycor commented May 13, 2024

In MySQL, string comparisons just ignore the NUL character ('\0')

mysql> select '123' = '123\0';
+-----------------+
| '123' = '123\0' |
+-----------------+
|               1 |
+-----------------+
1 row in set (0.0003 sec)
mysql> select '123' = '\01\02\03\0';
+-----------------------+
| '123' = '\01\02\03\0' |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.0004 sec)

The NUL characters still impact the length of the string

mysql> select length('\01\02\03\0');
+-----------------------+
| length('\01\02\03\0') |
+-----------------------+
|                     7 |
+-----------------------+
1 row in set (0.0004 sec)

It's likely that this is a default collation issue.
We see this behavior when using utf8mb4_0900_ai_ci and utf8mb4_unicode_ci, but not utf8mb4_0900_bin.

mysql> select '123' collate utf8mb4_0900_ai_ci = '123\0' collate utf8mb4_0900_ai_ci;
+-----------------------------------------------------------------------+
| '123' collate utf8mb4_0900_ai_ci = '123\0' collate utf8mb4_0900_ai_ci |
+-----------------------------------------------------------------------+
|                                                                     1 |
+-----------------------------------------------------------------------+
1 row in set (0.0004 sec)

mysql> select '123' collate utf8mb4_unicode_ci = '123\0' collate utf8mb4_unicode_ci;
+-----------------------------------------------------------------------+
| '123' collate utf8mb4_unicode_ci = '123\0' collate utf8mb4_unicode_ci |
+-----------------------------------------------------------------------+
|                                                                     1 |
+-----------------------------------------------------------------------+
1 row in set (0.0004 sec)
mysql> select '123' collate utf8mb4_0900_bin = '123\0' collate utf8mb4_0900_bin;
+-------------------------------------------------------------------+
| '123' collate utf8mb4_0900_bin = '123\0' collate utf8mb4_0900_bin |
+-------------------------------------------------------------------+
|                                                                 0 |
+-------------------------------------------------------------------+
1 row in set (0.0004 sec)

This behavior extends to tables

mysql> create table t (c char(10) primary key) collate utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.0126 sec)

mysql> insert into t values ('123');
Query OK, 1 row affected (0.0028 sec)

mysql> insert into t values ('\01\02\03\0');
ERROR: 1062: Duplicate entry '' for key 't.PRIMARY'

mysql> select * from t where c = '\01\02\03\0';
+-----+
| c   |
+-----+
| 123 |
+-----+
1 row in set (0.0005 sec)
mysql> create table t (c char(10) primary key) collate utf8mb4_0900_bin;
Query OK, 0 rows affected (0.0125 sec)

mysql> insert into t values ('123');
Query OK, 1 row affected (0.0027 sec)

mysql> insert into t values ('\01\02\03\0');
Query OK, 1 row affected (0.0019 sec)

mysql> select * from t where c = '\01\02\03\0';
+---------+
| c       |
+---------+
|  1 2 3  |
+---------+
1 row in set (0.0005 sec)

The utf8mb4_0900_ai_ci and utfmb4_unicode_ci collations do not work as expected in dolt.

Note: By default, PostgreSQL does not ignore NUL character

@jycor jycor added good repro Easily reproducible bugs correctness We don't return the same result as MySQL labels May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correctness We don't return the same result as MySQL good repro Easily reproducible bugs
Projects
None yet
Development

No branches or pull requests

1 participant