Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-25454 - UBSAN fixes and use clang when building with UBSAN in GitLab CI #3262

Open
wants to merge 2 commits into
base: 10.5
Choose a base branch
from

Conversation

anson1014
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-25454

Description

This PR addresses two common UBSAN violations found when running the MTR main suite with -DWITH_UBSAN=YES on clang-16. Currently running the MTR with UBSAN using gcc will not output any errors. The aim is to bring us closer to the eventual goal of this same result on both compilers.

Attempting to offset a null pointer by zero

  • Adding zero as an offset to a null pointer is considered undefined behaviour and will be reported by UBSAN as such with error: "runtime error: applying zero offset to null pointer". Simply refactor the offending cases to ignore the offset when it's value is zero.

Unrepresentable NaN values

  • UBSAN will report cases where NaN is casted to a type as undefined behaviour (example: "runtime error: -nan is outside the range of representable values of type 'unsigned long long'"). Check for NaN values for the offending cases and handle them accordingly.

Furthermore, edits were made to the .gitlab-ci.yml file to further support sanitizer testing.

  • As stated above, many sanitizer violations that are outputted are compiler specific. Running the MTR with sanitizers enabled when compiling with clang:latest will still output violations that are not experienced when building with gcc. This causes an ever-failing job which makes it harder to detect new genuine sanitizer related regressions. Refactor GitLab sanitizer jobs to use gcc instead.
  • Additionally, allow for the override of default MTR parameters. Useful for jobs like TSAN where the MTR does not complete without violations and the default max-test-fails is 10. This prevents the test running from exitting early so it can still provide us with info of the failures for all tests in the suite.

Release Notes

N/A

How can this PR be tested?

./mtr --suite=main successfully passing and the targeted UBSAN violations are no longer occurring when building/running with -DWITH_UBSAN=YES on clang.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

Many sanitizer violations that are outputted are compiler
specific. Running the MTR with sanitizers enabled when compiling
with clang:latest will still output violations that are not
experienced when building with gcc. This causes an ever-failing
build which makes it harder to detect new genuine sanitizer
related regressions. Refactor GitLab sanitizer jobs to use gcc
instead.

Additionally, the GitLab MTR job is subject to many MTR default
parameters. One of them being the max test failure amount of 10.
Refactor the MTR definition to allow for job specific overrides.
In doing the above we allow for the sanitizer jobs to run with
failures uncapped to output all violations for the entire MTR
main suite as jobs like TSAN currently do not run the
MTR without violations and are allowed to fail anyways.

All new code of the whole pull request, including one or several
files that are either new files or modified ones, are contributed
under the BSD-new license. I am contributing on behalf of my
employer Amazon Web Services, Inc.
Fix undefined behaviour of adding zero offset to null pointer

- Adding zero as an offset to a null pointer is considered undefined
behaviour and will be reported by UBSAN as such with error:
"runtime error: applying zero offset to null pointer".
Simply refactor the offending cases to ignore the offset
when it's value is zero.

Fix undefined behaviour of NaN being unrepresentable

- UBSAN will report cases where NaN is casted to a type as
undefined behaviour (example: "runtime error: -nan is outside
the range of representable values of type 'unsigned long long'").
This commit checks for NaN values for the offending cases
and handles them accordingly.

Edit MTRs to not run with UBSAN

- Tests main.execution_constants and main.lotofstack
are currently disabled when running with ASAN.
However, they do not pass when running with UBSAN
either. Change the tests to reflect this. One likely
reason this could have been missed prior is the common
convention to run UBSAN and ASAN in tandem.

All new code of the whole pull request, including one or several
files that are either new files or modified ones, are contributed
under the BSD-new license. I am contributing on behalf of my
employer Amazon Web Services, Inc.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@LinuxJedi LinuxJedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, and for finding these issues.

I've reviewed some of this, but it my comments likely apply to many of the other places. I get the need to fix these cases, but I don't think a length check just in case it is a NULL ptr when calculating the end in this way is the correct fix.

Also, please take care to look over the coding standards document in the source. I flagged a few things, but there are a few more.

@@ -1107,7 +1107,7 @@ size_t escape_string_for_mysql(CHARSET_INFO *charset_info,
const char *to_start= to;
const char *end, *to_end=to_start + (to_length ? to_length-1 : 2*length);
my_bool overflow= FALSE;
for (end= from + length; from < end; from++)
for (end= (length == 0)? from : from + length; from < end; from++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to fix "Attempting to offset a null pointer by zero"? It would be better to check if from is NULL or length is 0 and return surely?

@@ -8050,7 +8050,7 @@ best_access_path(JOIN *join,

/* Limit the number of matched rows */
tmp= cost_for_index_read(thd, table, key, (ha_rows) records,
(ha_rows) s->worst_seeks);
isnan(s->worst_seeks)? (ha_rows) UINT_MAX : (ha_rows) s->worst_seeks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a bit over 80 chars, also needs a space before the question mark.

@@ -476,6 +476,7 @@ class Index_statistics

void set_avg_frequency(uint i, double val)
{
if(isnan(val)) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after if.

@@ -142,7 +142,7 @@ ha_checksum _ma_unique_hash(MARIA_UNIQUEDEF *def, const uchar *record)
if (!length || length > tmp_length)
length=tmp_length; /* The whole blob */
}
end= pos+length;
end= (length == 0)? pos : pos + length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did UBSAN really flag here? We would have crashed long before here if pos was NULL.

@@ -259,7 +259,7 @@ my_bool _ma_unique_comp(MARIA_UNIQUEDEF *def, const uchar *a, const uchar *b,
{
if (a_length != b_length)
return 1;
end= pos_a+a_length;
end= (a_length == 0)? pos_a : pos_a+a_length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be better to return if a or b is NULL?


end= a + (length= MY_MIN(a_length, b_length));
end= (length == 0)? a : a + length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hit this, something very bad happened. Again, return might be better.

@@ -291,7 +291,7 @@ int my_wc_mb_bin(CHARSET_INFO *cs __attribute__((unused)),
void my_hash_sort_bin(CHARSET_INFO *cs __attribute__((unused)),
const uchar *key, size_t len,ulong *nr1, ulong *nr2)
{
const uchar *end = key + len;
const uchar *end = (len == 0)? key : key + len;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return if len is 0 or key is NULL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants