MDEV-25454 - UBSAN fixes and use clang when building with UBSAN in GitLab CI #3262

anson1014 · 2024-05-16T04:52:27Z

The Jira issue number for this PR is: MDEV-25454

Description

This PR addresses two common UBSAN violations found when running the MTR main suite with -DWITH_UBSAN=YES on clang-16. Currently running the MTR with UBSAN using gcc will not output any errors. The aim is to bring us closer to the eventual goal of this same result on both compilers.

Attempting to offset a null pointer by zero

Adding zero as an offset to a null pointer is considered undefined behaviour and will be reported by UBSAN as such with error: "runtime error: applying zero offset to null pointer". Simply refactor the offending cases to ignore the offset when it's value is zero.

Unrepresentable NaN values

UBSAN will report cases where NaN is casted to a type as undefined behaviour (example: "runtime error: -nan is outside the range of representable values of type 'unsigned long long'"). Check for NaN values for the offending cases and handle them accordingly.

Furthermore, edits were made to the .gitlab-ci.yml file to further support sanitizer testing.

As stated above, many sanitizer violations that are outputted are compiler specific. Running the MTR with sanitizers enabled when compiling with clang:latest will still output violations that are not experienced when building with gcc. This causes an ever-failing job which makes it harder to detect new genuine sanitizer related regressions. Refactor GitLab sanitizer jobs to use gcc instead.
Additionally, allow for the override of default MTR parameters. Useful for jobs like TSAN where the MTR does not complete without violations and the default max-test-fails is 10. This prevents the test running from exitting early so it can still provide us with info of the failures for all tests in the suite.

Release Notes

N/A

How can this PR be tested?

./mtr --suite=main successfully passing and the targeted UBSAN violations are no longer occurring when building/running with -DWITH_UBSAN=YES on clang.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

Many sanitizer violations that are outputted are compiler specific. Running the MTR with sanitizers enabled when compiling with clang:latest will still output violations that are not experienced when building with gcc. This causes an ever-failing build which makes it harder to detect new genuine sanitizer related regressions. Refactor GitLab sanitizer jobs to use gcc instead. Additionally, the GitLab MTR job is subject to many MTR default parameters. One of them being the max test failure amount of 10. Refactor the MTR definition to allow for job specific overrides. In doing the above we allow for the sanitizer jobs to run with failures uncapped to output all violations for the entire MTR main suite as jobs like TSAN currently do not run the MTR without violations and are allowed to fail anyways. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

Fix undefined behaviour of adding zero offset to null pointer - Adding zero as an offset to a null pointer is considered undefined behaviour and will be reported by UBSAN as such with error: "runtime error: applying zero offset to null pointer". Simply refactor the offending cases to ignore the offset when it's value is zero. Fix undefined behaviour of NaN being unrepresentable - UBSAN will report cases where NaN is casted to a type as undefined behaviour (example: "runtime error: -nan is outside the range of representable values of type 'unsigned long long'"). This commit checks for NaN values for the offending cases and handles them accordingly. Edit MTRs to not run with UBSAN - Tests main.execution_constants and main.lotofstack are currently disabled when running with ASAN. However, they do not pass when running with UBSAN either. Change the tests to reflect this. One likely reason this could have been missed prior is the common convention to run UBSAN and ASAN in tandem. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

CLAassistant · 2024-05-16T04:52:33Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

LinuxJedi

Thanks for the contribution, and for finding these issues.

I've reviewed some of this, but it my comments likely apply to many of the other places. I get the need to fix these cases, but I don't think a length check just in case it is a NULL ptr when calculating the end in this way is the correct fix.

Also, please take care to look over the coding standards document in the source. I flagged a few things, but there are a few more.

LinuxJedi · 2024-05-17T12:07:33Z

mysys/charset.c

@@ -1107,7 +1107,7 @@ size_t escape_string_for_mysql(CHARSET_INFO *charset_info,
  const char *to_start= to;
  const char *end, *to_end=to_start + (to_length ? to_length-1 : 2*length);
  my_bool overflow= FALSE;
-  for (end= from + length; from < end; from++)
+  for (end= (length == 0)? from : from + length; from < end; from++)


Is this supposed to fix "Attempting to offset a null pointer by zero"? It would be better to check if from is NULL or length is 0 and return surely?

LinuxJedi · 2024-05-17T12:09:41Z

sql/sql_select.cc

@@ -8050,7 +8050,7 @@ best_access_path(JOIN      *join,

            /* Limit the number of matched rows */
            tmp= cost_for_index_read(thd, table, key, (ha_rows) records,
-                                     (ha_rows) s->worst_seeks);
+                                     isnan(s->worst_seeks)? (ha_rows) UINT_MAX : (ha_rows) s->worst_seeks);


This is quite a bit over 80 chars, also needs a space before the question mark.

LinuxJedi · 2024-05-17T12:10:00Z

sql/sql_statistics.h

@@ -476,6 +476,7 @@ class Index_statistics

  void set_avg_frequency(uint i, double val)
  {
+    if(isnan(val)) return;


Space after if.

LinuxJedi · 2024-05-17T12:14:42Z

storage/maria/ma_unique.c

@@ -142,7 +142,7 @@ ha_checksum _ma_unique_hash(MARIA_UNIQUEDEF *def, const uchar *record)
      if (!length || length > tmp_length)
 	length=tmp_length;			/* The whole blob */
    }
-    end= pos+length;
+    end= (length == 0)? pos : pos + length;


Did UBSAN really flag here? We would have crashed long before here if pos was NULL.

LinuxJedi · 2024-05-17T12:17:48Z

storage/maria/ma_unique.c

@@ -259,7 +259,7 @@ my_bool _ma_unique_comp(MARIA_UNIQUEDEF *def, const uchar *a, const uchar *b,
    {
      if (a_length != b_length)
        return 1;
-      end= pos_a+a_length;
+      end= (a_length == 0)? pos_a : pos_a+a_length;


It would probably be better to return if a or b is NULL?

LinuxJedi · 2024-05-17T12:19:59Z

strings/ctype-bin.c


-  end= a + (length= MY_MIN(a_length, b_length));
+  end= (length == 0)? a : a + length;


If we hit this, something very bad happened. Again, return might be better.

LinuxJedi · 2024-05-17T12:20:57Z

strings/ctype-bin.c

@@ -291,7 +291,7 @@ int my_wc_mb_bin(CHARSET_INFO *cs __attribute__((unused)),
 void my_hash_sort_bin(CHARSET_INFO *cs __attribute__((unused)),
                      const uchar *key, size_t len,ulong *nr1, ulong *nr2)
 {
-  const uchar *end = key + len;
+  const uchar *end = (len == 0)? key : key + len;


Return if len is 0 or key is NULL.

anson1014 added 2 commits May 16, 2024 04:46

LinuxJedi requested changes May 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDEV-25454 - UBSAN fixes and use clang when building with UBSAN in GitLab CI #3262

MDEV-25454 - UBSAN fixes and use clang when building with UBSAN in GitLab CI #3262

anson1014 commented May 16, 2024

CLAassistant commented May 16, 2024

LinuxJedi left a comment

LinuxJedi May 17, 2024

LinuxJedi May 17, 2024

LinuxJedi May 17, 2024

LinuxJedi May 17, 2024

LinuxJedi May 17, 2024

LinuxJedi May 17, 2024

LinuxJedi May 17, 2024


		end= a + (length= MY_MIN(a_length, b_length));
		end= (length == 0)? a : a + length;

MDEV-25454 - UBSAN fixes and use clang when building with UBSAN in GitLab CI #3262

Are you sure you want to change the base?

MDEV-25454 - UBSAN fixes and use clang when building with UBSAN in GitLab CI #3262

Conversation

anson1014 commented May 16, 2024

Description

Release Notes

How can this PR be tested?

PR quality check

CLAassistant commented May 16, 2024

LinuxJedi left a comment

Choose a reason for hiding this comment

LinuxJedi May 17, 2024

Choose a reason for hiding this comment

LinuxJedi May 17, 2024

Choose a reason for hiding this comment

LinuxJedi May 17, 2024

Choose a reason for hiding this comment

LinuxJedi May 17, 2024

Choose a reason for hiding this comment

LinuxJedi May 17, 2024

Choose a reason for hiding this comment

LinuxJedi May 17, 2024

Choose a reason for hiding this comment

LinuxJedi May 17, 2024

Choose a reason for hiding this comment