Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddressSanitizer:DEADLYSIGNAL while starting proxy #18661

Open
xemul opened this issue May 14, 2024 · 4 comments
Open

AddressSanitizer:DEADLYSIGNAL while starting proxy #18661

xemul opened this issue May 14, 2024 · 4 comments
Labels
symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework

Comments

@xemul
Copy link
Contributor

xemul commented May 14, 2024

Found when validating a test change #18644

https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/8753/artifact/testlog/x86_64/debug/scylla-5089.log

Scylla version 5.5.0~dev-0.20240513.15ff1082e87a with build-id 2ee3a42e365bfea5eb7beb28b12f3fa397b52bb0 starting ...
command used: "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/debug/scylla --smp 2 -m 1G --collectd 0 --overprovisioned --max-networking-io-control-blocks 1000 --unsafe-bypass-fsync 1 --kernel-page-cache 1 --commitlog-use-o-dsync 0 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --abort-on-internal-error 1 --abort-on-ebadf 1 --logger-log-level raft_topology=debug --logger-log-level query_processor=debug"
pid: 441091
parsed command line options: [smp, (positional) 2, -m, (positional) 1G, collectd, (positional) 0, overprovisioned, max-networking-io-control-blocks, (positional) 1000, unsafe-bypass-fsync, (positional) 1, kernel-page-cache, (positional) 1, commitlog-use-o-dsync: 0, abort-on-lsa-bad-alloc: 1, abort-on-seastar-bad-alloc, abort-on-internal-error: 1, abort-on-ebadf: 1, logger-log-level, (positional) raft_topology=debug, logger-log-level, (positional) query_processor=debug]
WARNING: debug mode. Not for benchmarking or production
WARN  2024-05-13 22:09:48,005 seastar - Seastar compiled with default allocator, --memory option won't take effect
WARN  2024-05-13 22:09:48,035 seastar - Could not read cgroups v2 file (memory.max).
WARN  2024-05-13 22:09:48,037 seastar - Seastar compiled with default allocator, will not abort on bad_alloc
INFO  2024-05-13 22:09:48,037 seastar - Reactor backend: linux-aio
INFO  2024-05-13 22:09:48,039 seastar - Perf-based stall detector creation failed (EACCESS), try setting /proc/sys/kernel/perf_event_paranoid to 1 or less to enable kernel backtraces: falling back to posix timer.
==441091==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
INFO  2024-05-13 22:09:48,062 [shard 0:main] seastar - updated: blocked-reactor-notify-ms=36000000000
INFO  2024-05-13 22:09:48,062 [shard 1:main] seastar - updated: blocked-reactor-notify-ms=36000000000
INFO  2024-05-13 22:09:48,079 [shard 0:main] init - installing SIGHUP handler
INFO  2024-05-13 22:09:48,847 [shard 0:main] init - Scylla version 5.5.0~dev-0.20240513.15ff1082e87a with build-id 2ee3a42e365bfea5eb7beb28b12f3fa397b52bb0 starting ...

WARN  2024-05-13 22:09:48,853 [shard 0:main] init - I/O Scheduler is not properly configured! This is a non-supported setup, and performance is expected to be unpredictably bad.
 Reason found: none of --io-properties and --io-properties-file are set.
To properly configure the I/O Scheduler, run the scylla_io_setup utility shipped with Scylla.

INFO  2024-05-13 22:09:48,863 [shard 0:main] init - starting API server
INFO  2024-05-13 22:09:48,869 [shard 0:main] init - starting prometheus API server
INFO  2024-05-13 22:09:48,874 [shard 0:main] init - creating snitch
INFO  2024-05-13 22:09:48,875 [shard 0:main] init - starting tokens manager
INFO  2024-05-13 22:09:48,877 [shard 0:main] init - starting effective_replication_map factory
INFO  2024-05-13 22:09:48,877 [shard 0:main] init - starting migration manager notifier
INFO  2024-05-13 22:09:48,878 [shard 0:main] init - starting per-shard database core
INFO  2024-05-13 22:09:48,879 [shard 0:main] init - creating and verifying directories
INFO  2024-05-13 22:09:48,974 [shard 0:main] init - starting compaction_manager
INFO  2024-05-13 22:09:48,974 [shard 0:main] task_manager - Registered module compaction
INFO  2024-05-13 22:09:48,981 [shard 1:main] task_manager - Registered module compaction
INFO  2024-05-13 22:09:48,986 [shard 0:main] compaction_manager - Set unlimited compaction bandwidth
INFO  2024-05-13 22:09:48,988 [shard 0:main] init - starting database
INFO  2024-05-13 22:09:49,052 [shard 0:main] seastar - updated: blocked-reactor-notify-ms=25
INFO  2024-05-13 22:09:49,052 [shard 1:main] seastar - updated: blocked-reactor-notify-ms=25
INFO  2024-05-13 22:09:49,053 [shard 0:main] init - starting storage proxy
AddressSanitizer:DEADLYSIGNAL
=================================================================
==441091==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7fb0c72189d6 bp 0x7ffd126ae340 sp 0x7ffd126ae2f0 T0)
Reactor stalled for 33 ms on shard 1. Backtrace: 0xd6ad90a 0x48b2a14 0x48b1f1c 0x46ecb92 0x46e770f 0x46e71b7 0x46e7a78 0x46edfaa 0x3dbaf 0xd676896 0x43eaef0 0x442c58b 0x442c076 0x442bbca 0x443d1d3 0x443cbbf 0x443c6ea 0x443c462 0x43d8f92 0x43bff14 0x43bf062 0x43c2057 0x43b2df9 0x110de194 0xd7e4a1e 0x485be5b 0x474b960 0x481df3b 0x481d95f 0x481d8af 0x481d3e3 0x481cfc7 0x481fe0f 0x481d10b 0x474d97d 0x474d705 0x482d815 0x482c3c3 0x4830463 0x470f65e 0x4717b80 0x471bba2 0x481b015 0x48191b0 0x48190a0 0x481880c 0x44be108 0x8c946 0x11296f
==441091==The signal is caused by a READ memory access.
==441091==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.
@xemul xemul added the symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework label May 14, 2024
@xemul
Copy link
Contributor Author

xemul commented May 29, 2024

Such a nice catch! This time it also got stalled in all groups prior to failure

Reactor stalled for 34 ms on shard 0, in scheduling group streaming. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0x446141f 0x4443179 0x4441bb2 0x4444ba7 0x4435949 0x110728d4 0xd760fce 0x48dedfb 0x47ce900 0x48a1ea8 0x48a2c80 0x48a2c20 0x48a2bc4 0x48a2b5a 0x48a2a79 0x48a29d7 0x48a2873 0x48a265c 0x47925fe 0x479ab20 0x479eb42 0x479c763 0x42c3c0a 0x42c11aa 0xd6c37ea 0xd6c05e8 0x27b89 0x27c4a 0xd5e44e4
Reactor stalled for 34 ms on shard 1, in scheduling group atexit. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0x45315d0 0x4530e69 0x4464d98 0x4443602 0x4441bb2 0x4444ba7 0x4435949 0x110728d4 0xd760fce 0x48dedfb 0x47ce900 0x48a0edb 0x48a08ff 0x48a084f 0x48a0383 0x489ff67 0x48a2daf 0x48a00ab 0x47d091d 0x47d06a5 0x48b07b5 0x48af363 0x48b3403 0x47925fe 0x479ab20 0x479eb42 0x489dfb5 0x489c150 0x489c040 0x489b7ac 0x4540c58 0x8c946 0x11296f
Reactor stalled for 33 ms on shard 0, in scheduling group mem_compaction. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0xd67e60a 0x403663a 0x40365dc 0x403659c 0x40364e5 0x4036361 0x4030fb5 0x403f54a 0x403f48c 0x403f27b 0x403f02d 0x403ef89 0x403eb5b 0x403dc7b 0x403d8c3 0x403d30b 0x403c94e 0x40398b4 0x44aaebb 0x452b871 0x452b5c0 0x4463273 0x44423c8 0x4441bb2 0x4444ba7 0x4435949 0x110728d4 0xd760fce 0x48dedfb 0x47ce900 0x48a1ea8 0x48a2c80 0x48a2c20 0x48a2bc4 0x48a2b5a 0x48a2a79 0x48a29d7 0x48a2873 0x48a265c 0x47925fe 0x479ab20 0x479eb42 0x479c763 0x42c3c0a 0x42c11aa 0xd6c37ea 0xd6c05e8 0x27b89 0x27c4a 0xd5e44e4
Reactor stalled for 34 ms on shard 1, in scheduling group streaming. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0x40230f0 0x4022868 0x40226d8 0x4022358 0x4441600 0x4444ba7 0x4435949 0x110728d4 0xd760fce 0x48dedfb 0x47ce900 0x48a1ea8 0x48a2c80 0x48a2c20 0x48a2bc4 0x48a2b5a 0x48a2a79 0x48a29d7 0x48a2873 0x48a265c 0x47925fe 0x479ab20 0x479eb42 0x489dfb5 0x489c150 0x489c040 0x489b7ac 0x4540c58 0x8c946 0x11296f
Reactor stalled for 33 ms on shard 1, in scheduling group mem_compaction. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0x45062f5 0x4458d36 0x4434fd2 0x4434e68 0x1105325f 0x11064eac 0xd760fe4 0x48dedfb 0x47ce900 0x48a1ea8 0x48a2c80 0x48a2c20 0x48a2bc4 0x48a2b5a 0x48a2a79 0x48a29d7 0x48a2873 0x48a265c 0x47925fe 0x479ab20 0x479eb42 0x489dfb5 0x489c150 0x489c040 0x489b7ac 0x4540c58 0x8c946 0x11296f
Reactor stalled for 33 ms on shard 0, in scheduling group memtable. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0xd5e7f72 0xd5e5f37 0xd67f192 0x4030ad6 0x4457b6c 0x44555e4 0x443d89c 0x452b9ad 0x452b5c0 0x4463273 0x44423c8 0x4441bb2 0x4444ba7 0x4435949 0x11079287 0xd760fce 0x48dedfb 0x47ce900 0x48a1ea8 0x48a2c80 0x48a2c20 0x48a2bc4 0x48a2b5a 0x48a2a79 0x48a29d7 0x48a2873 0x48a265c 0x47925fe 0x479ab20 0x479eb42 0x479c763 0x42c3c0a 0x42c11aa 0xd6c37ea 0xd6c05e8 0x27b89 0x27c4a 0xd5e44e4
Reactor stalled for 33 ms on shard 1, in scheduling group memtable. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0xd6a9c97 0xd689f13 0xd67eedb 0x1107a3d2 0xd760fce 0x48dedfb 0x47ce900 0x48a1ea8 0x48a2c80 0x48a2c20 0x48a2bc4 0x48a2b5a 0x48a2a79 0x48a29d7 0x48a2873 0x48a265c 0x47925fe 0x479ab20 0x479eb42 0x489dfb5 0x489c150 0x489c040 0x489b7ac 0x4540c58 0x8c946 0x11296f
Reactor stalled for 33 ms on shard 0, in scheduling group gossip. Backtrace: 0xd629c0a 0x4935a44 0x4934f4c 0x476fad5 0x476a40f 0x4769eb7 0x476a778 0x4770f4a 0x3dbaf 0x42db31f 0x443edd5 0xe797483 0xe798d0e 0x11078c14 0xd760fce 0x48dedfb 0x47ce900 0x48a1ea8 0x48a2c80 0x48a2c20 0x48a2bc4 0x48a2b5a 0x48a2a79 0x48a29d7 0x48a2873 0x48a265c 0x47925fe 0x479ab20 0x479eb42 0x479c763 0x42c3c0a 0x42c11aa 0xd6c37ea 0xd6c05e8 0x27b89 0x27c4a 0xd5e44e4

I'd explain this as -- some code stepped into erroneous endless loop somewhere and eventually overflew some boundary and crashed

@raphaelsc
Copy link
Member

Reactor stalled for 33 ms on shard 0, in scheduling group gossip. Backtrace:
[Backtrace #0]
__interceptor_backtrace at ??:?
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
seastar::metrics::impl::metric_definition_impl seastar::metrics::make_counter<unsigned long&>(seastar::basic_sstring<char, unsigned int, 15u, true>, unsigned long&, seastar::metrics::description, std::vector<seastar::metrics::label_instance, std::allocator<seastar::metrics::label_instance> >) at ././seastar/include/seastar/core/metrics.hh:531
seastar::metrics::impl::metric_definition_impl seastar::metrics::make_total_operations<unsigned long&>(seastar::basic_sstring<char, unsigned int, 15u, true>, unsigned long&, seastar::metrics::description, std::vector<seastar::metrics::label_instance, std::allocator<seastar::metrics::label_instance> >, seastar::basic_sstring<char, unsigned int, 15u, true>) at ././seastar/include/seastar/core/metrics.hh:678
service::storage_proxy_stats::stats::register_stats() at ./service/storage_proxy.cc:2773
operator() at ./main.cc:1122
 (inlined by) void std::__invoke_impl<void, scylla_main(int, char**)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda(void*)#1}&, {lambda()#2}>(std::__invoke_other, scylla_main(int, char**)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda(void*)#1}&, {lambda()#2}&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, scylla_main(int, char**)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda(void*)#1}&, {lambda()#2}>, scylla_main(int, char**)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda(void*)#1}&>::type std::__invoke_r<void, scylla_main(int, char**)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda(void*)#1}&, {lambda()#2}>(std::enable_if&&, (void&&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (void*), scylla_main(int, char**)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290

maybe some regression in metric layer?

@amnonh does it ring a bell for you?

@michoecho
Copy link
Contributor

Such a nice catch! This time it also got stalled in all groups prior to failure
maybe some regression in metric layer?

It's just a normal stall. Nothing to see here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Projects
None yet
Development

No branches or pull requests

3 participants