Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reactor: default to oneline backtraces in signal handlers #2052

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fruch
Copy link

@fruch fruch commented Jan 22, 2024

align the handlers with the same logic that is used in the reactor stalls backtrace prints

Ref: scylladb/scylladb#16922

@fruch fruch requested a review from nyh January 22, 2024 21:34
@fruch fruch force-pushed the signal_handler_default_oneline_backtrace branch 4 times, most recently from 008aa90 to ddda81c Compare January 23, 2024 10:25
@fruch
Copy link
Author

fruch commented Jan 23, 2024

I gave up trying to pass the reactor_options into the signal handlers

I think of possible two direction

  1. pass way is via global variables
  2. duplicate the handlers - multiline handlers, oneline handlers

@fruch fruch force-pushed the signal_handler_default_oneline_backtrace branch from ddda81c to 1174443 Compare January 23, 2024 16:08
@nyh
Copy link
Contributor

nyh commented Jan 23, 2024

Some of the CI runs failed because of compiler warnings on unused functions and such. Please look into these failures.

@nyh
Copy link
Contributor

nyh commented Jan 23, 2024

align the handlers with the same logic that is used in the reactor stalls backtrace prints

Ref: scylladb/scylladb#16922

I am missing some context about this patch:

  1. What's wrong with multi-line backtraces, especially in the case of a one-off abort. I see some comments in the ScyllaDB issue about some ScyllaDB-specific testing tools that don't like the multi-line format, but it's not a very convincing reason.
  2. Why did we reach the current situation that we have both formats? The commit that introduced the one-line backtrace for reactor stalls, 4006d78, by @bhalevy doesn't explain, or why he also wanted to keep the old format. I want to understand that the current mishmash - stalls printed in one format and crashes in another format - was never intentional.
  3. Do our tools, especially Seastar's scripts/addr2line.py and @bhalevy 's backtracing website, support the one-line format? (maybe they already do, but it should be tested, and mentioned).

@fruch
Copy link
Author

fruch commented Jan 23, 2024

align the handlers with the same logic that is used in the reactor stalls backtrace prints

Ref: scylladb/scylladb#16922

I am missing some context about this patch:

  1. What's wrong with multi-line backtraces, especially in the case of a one-off abort. I see some comments in the ScyllaDB issue about some ScyllaDB-specific testing tools that don't like the multi-line format, but it's not a very convincing reason.

It was agreed 4 years ago, you can see the rest of the discussion here:
scylladb/scylladb#5464

There was agreement almost from side to side that one line backtraces are better

And evens idea and suggestions on how to compress it even further

It's not that the tool doesn't like it, it adds complexity which is slowing those tools down, and they need to handle cases of interlaced backtraces as one example (yes it was relevert to stalls more the signal handlers), but again it is something that was agreed almost 4 years ago

  1. Why did we reach the current situation that we have both formats? The commit that introduced the one-line backtrace for reactor stalls, 4006d78, by @bhalevy doesn't explain, or why he also wanted to keep the old format. I want to understand that the current mishmash - stalls printed in one format and crashes in another format - was never intentional.

I don't know if it was intentional or not, I think @bhalevy cared more about reactor stalls in the context of performance testing and analysis

  1. Do our tools, especially Seastar's scripts/addr2line.py and @bhalevy 's backtracing website, support the one-line format? (maybe they already do, but it should be tested, and mentioned).

Again @bhalevy probably know the answer to this one, I assume the answer is yes they do.

@@ -4236,6 +4246,7 @@ unsigned smp::adjust_max_networking_aio_io_control_blocks(unsigned network_iocbs
void smp::configure(const smp_options& smp_opts, const reactor_options& reactor_opts)
{
bool use_transparent_hugepages = !reactor_opts.overprovisioned;
bool oneline = reactor_opts.blocked_reactor_report_format_oneline.get_value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this option, "blocked_reactor_report_format_oneline", is no longer a very good name after this patch :-(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you think we should change the name ? or create a new flag ?

if (oneline) {
install_oneshot_signal_handler<SIGSEGV, sigsegv_action_oneline_backtrace>();
} else {
install_oneshot_signal_handler<SIGSEGV, sigsegv_action>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common way to get rid of such duplicate functions in modern C++ is to use lambdas. I.e., have a single function

static void sigsegv_action(bool oneline) noexcept {
    print_with_backtrace("Segmentation fault", oneline);
    reraise_signal(SIGSEGV);
}

and then install it as:

install_oneshot_signal_handler<SIGSEGV, [oneline]() { sigsegv_action(oneline); }>();

(I didn't test this, I don't know why this install function is a template, by the way)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went in few circle to figure how to do that, I'll give this option a try.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried using this,

but the temple part isn't happy with it:

/home/fruch/projects/scylla/seastar/src/core/reactor.cc:4266:5: error: no matching function for call to 'install_oneshot_signal_handler'
    install_oneshot_signal_handler<SIGSEGV, [oneline]() { sigsegv_action(oneline); }>();
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/fruch/projects/scylla/seastar/src/core/reactor.cc:4029:6: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Func'
void install_oneshot_signal_handler() {
     ^
/home/fruch/projects/scylla/seastar/src/core/reactor.cc:4270:5: error: no matching function for call to 'install_oneshot_signal_handler'
    install_oneshot_signal_handler<SIGABRT, [oneline]() { sigabrt_action(oneline); }>();
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/fruch/projects/scylla/seastar/src/core/reactor.cc:4029:6: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Func'
void install_oneshot_signal_handler() {
     ^
2 errors generated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda needs to go inside the parentheses.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like that ?

install_oneshot_signal_handler<SIGABRT, >([oneline]() { sigabrt_action(oneline); });

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without random commas,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regardless of the comma, this is not how the function is defined, nor how the template is defined.

so basically to get rid of the template all together ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the function signature:

template<int Signal, void(*Func)()>
void install_oneshot_signal_handler() {

it receives the signal number and callback function via template, and not via the functions arguments.

I don't have a clue why it defined like that

align the handlers with the same logic that is used in
the reactor stalls backtrace prints

Ref: scylladb/scylladb#16922
@fruch fruch force-pushed the signal_handler_default_oneline_backtrace branch from 1174443 to 845d704 Compare January 25, 2024 09:59
@fruch
Copy link
Author

fruch commented Jan 25, 2024

at least now the compilers are happy, but still didn't found a way to get the template working, to avoid the duplication.

@fruch
Copy link
Author

fruch commented Jun 2, 2024

@scylladb/seastar-maint

can someone give a hand with this one ? the only approach I've been able to conjure here is to duplicate the handlers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants