Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestUpdateClusterLayout.test_change_node_ip fails (when getting host-id from nodetool) #18693

Open
xemul opened this issue May 15, 2024 · 9 comments
Assignees
Labels
area/nodetool good first issue P3 Medium Priority symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework

Comments

@xemul
Copy link
Contributor

xemul commented May 15, 2024

https://jenkins.scylladb.com/job/scylla-master/job/gating-dtest-release-with-consistent-topology-changes/1589/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/FullDtest___full_split001___test_change_node_ip/

16:51:47,577 747     ccm                            DEBUG    cluster.py          :754  | test_change_node_ip: node2: checking 
16:51:47,577 747     update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:2459 | test_change_node_ip: nodetool status from node2: {'Datacenter': 'datacenter1', 'nodes': [{'status': 'UN', 'address': '127.0.74.1', 'load': '?', 'tokens': '256', 'owns': '?', 'host id': '38c2e454-2ce0-4557-868b-67efde6e1c22', 'rack': 'rack1'}, {'status': 'UN', 'address': '127.0.74.2', 'load': '407.32 KB', 'tokens': '256', 'owns': '?', 'host id': '22bce9b5-9c00-4367-a132-e624593a2de0', 'rack': 'rack1'}, {'status': 'UN', 'address': '127.0.74.33', 'load': '?', 'tokens': '256', 'owns': '?', 'host id': '?', 'rack': 'rack1'}]}
16:51:47,799 747     errors                         ERROR    conftest.py         :202  | test_change_node_ip: test failed: 
        for node in cluster.nodelist():
            status = nodetool_status(node)
            logger.debug(f"nodetool status from {node.name}: {status}")
            for n in status["nodes"]:
                if n["address"] == ip3:
>                   assert n["host id"] == hostid3
E                   AssertionError: assert '?' == 'c4dc72ab-76c9-4941-a82c-2d63ac5e167b'
E                     - c4dc72ab-76c9-4941-a82c-2d63ac5e167b
E                     + ?

test was re-enabled recently by https://github.com/bhalevy/scylla-dtest/commit/73fd1b731a74ad4a6c00ef51a17a50b08464dbce (from https://github.com/scylladb/scylla-dtest/issues/4273)


EDIT from @kbr-scylla: explanation here: #18693 (comment)

@xemul xemul added the symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework label May 15, 2024
@kbr-scylla
Copy link
Contributor

@kbr-scylla
Copy link
Contributor

Could it be another manifestation of #17857?

@kbr-scylla
Copy link
Contributor

#17857 sounds very plausible (cc @bhalevy). In this test we have a node which changes IP address. Once it restarts, other nodes see it in gossip, but before performing storage_service::handle_state_normal, they don't put it in token_metadata. So nodetool status which uses token_metadata::get_endpoint_to_host_id_map_for_reading() to assign host ID to every IP in the status list, does not find the host ID for this IP for some time.

Idea to reproduce: start two nodes A and B, then inject a sleep at the beginning of handle_state_normal for A, then restart B with a different IP address. During the sleep, nodetool status called on A should show missing host ID for B for some time.

@kbr-scylla
Copy link
Contributor

Actually it cannot be #17857 and what I wrote above doesn't make sense.

#17857 was observed in a test run #16902 in gossip-topology mode.

This issue was observed in dtest run with raft-topology and there we don't call storage_service::handle_state_normal.

In raft-topology mode, the function responsible for reacting to new endpoints appearing in gossip is storage_service::raft_ip_address_updater::on_endpoint_change. So I suspect a bug could be somewhere in this function.

However I analyzed the code and I don't see how the bug can happen. The function doesn't do anything if it sees that the host ID is missing:

    future<>
    on_endpoint_change(gms::inet_address endpoint, gms::endpoint_state_ptr ep_state, gms::permit_id permit_id, const char* ev) {
        auto app_state_ptr = ep_state->get_application_state_ptr(gms::application_state::HOST_ID);
        if (!app_state_ptr) {
            co_return;
        }
        raft::server_id id(utils::UUID(app_state_ptr->value()));

and when it updates token_metadata, it does it in an isolated way using storage_service::mutate_token_metadata / storage_service::replicate_to_all_cores (which atomically swaps a pointer from old to new version) so the REST API endpoint /storage_service/host_id should not be able to observe a "partial" token_metadata with missing host IDs. It should either see full old token_metadata with the host ID assigned to old IP, or full new token_metadata with the host ID assigned to new IP. (And the same is true for handle_state_normal.)

Something fishy is going on here. Perhaps we still have some other place which is incorrectly updating token_metadata?

@kbr-scylla
Copy link
Contributor

This is what the nodes saw when the test executed nodetool:

16:51:47,000 747     update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:2454 | test_change_node_ip: Node3 is now up
16:51:47,001 747     urllib3.connectionpool         DEBUG    connectionpool.py   :246  | test_change_node_ip: Starting new HTTP connection (1): 127.0.74.33:10000
16:51:47,001 747     urllib3.connectionpool         DEBUG    connectionpool.py   :474  | test_change_node_ip: http://127.0.74.33:10000 "GET /storage_service/hostid/local HTTP/1.1" 200 38
16:51:47,113 747     ccm                            DEBUG    cluster.py          :754  | test_change_node_ip: node1: nodetool cmd=['/jenkins/workspace/scylla-master/gating-dtest-release-with-consistent-topology-changes/scylla/.dtest/dtest-vql32m_i/test/node1/bin/scylla', 'nodetool', '-h', '127.0.74.1', '-p', '10000', 'status'] wait=True timeout=None
16:51:47,315 747     ccm                            DEBUG    cluster.py          :754  | test_change_node_ip: node1: checking 
16:51:47,316 747     update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:2459 | test_change_node_ip: nodetool status from node1: {'Datacenter': 'datacenter1', 'nodes': [{'status': 'UN', 'address': '127.0.74.1', 'load': '253.86 KB', 'tokens': '256', 'owns': '?', 'host id': '38c2e454-2ce0-4557-868b-67efde6e1c22', 'rack': 'rack1'}, {'status': 'UN', 'address': '127.0.74.2', 'load': '407.32 KB', 'tokens': '256', 'owns': '?', 'host id': '22bce9b5-9c00-4367-a132-e624593a2de0', 'rack': 'rack1'}, {'status': 'DN', 'address': '127.0.74.3', 'load': '497.95 KB', 'tokens': '256', 'owns': '?', 'host id': 'c4dc72ab-76c9-4941-a82c-2d63ac5e167b', 'rack': 'rack1'}]}
16:51:47,421 747     ccm                            DEBUG    cluster.py          :754  | test_change_node_ip: node2: nodetool cmd=['/jenkins/workspace/scylla-master/gating-dtest-release-with-consistent-topology-changes/scylla/.dtest/dtest-vql32m_i/test/node2/bin/scylla', 'nodetool', '-h', '127.0.74.2', '-p', '10000', 'status'] wait=True timeout=None
16:51:47,577 747     ccm                            DEBUG    cluster.py          :754  | test_change_node_ip: node2: checking 
16:51:47,577 747     update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:2459 | test_change_node_ip: nodetool status from node2: {'Datacenter': 'datacenter1', 'nodes': [{'status': 'UN', 'address': '127.0.74.1', 'load': '?', 'tokens': '256', 'owns': '?', 'host id': '38c2e454-2ce0-4557-868b-67efde6e1c22', 'rack': 'rack1'}, {'status': 'UN', 'address': '127.0.74.2', 'load': '407.32 KB', 'tokens': '256', 'owns': '?', 'host id': '22bce9b5-9c00-4367-a132-e624593a2de0', 'rack': 'rack1'}, {'status': 'UN', 'address': '127.0.74.33', 'load': '?', 'tokens': '256', 'owns': '?', 'host id': '?', 'rack': 'rack1'}]}

@kbr-scylla
Copy link
Contributor

BTW there is a slight difference in how Scylla nodetool behaves compared to Cassandra nodetool: if host ID is missing in /storage_service/host_id endpoint, old nodetool will print null:

        hostID = hostIDMap.get(endpoint);

get returns null for missing key

new nodetool will print ?:

endpoint_host_id.contains(ep) ? endpoint_host_id.at(ep) : "?"

cc @denesb

(probably doesn't matter)

@denesb
Copy link
Contributor

denesb commented May 22, 2024

Yes, I know about this difference, and I thought "?" makes more sense than "null". Note that other columns in the status output also use "?" for when there is no data available (e.g. tokens, load).

@kbr-scylla
Copy link
Contributor

Ok I see how the issue can happen

It's just an artifact of how nodetool is implemented. It performs two REST API endpoint calls: one to get "token to endpoint map", another to get "host ID map". The IP change notification inside Scylla can be handled in-between those calls.

New nodetool gets the host ID map first, and the token-to-endpoint map second:

    const auto endpoint_host_id = rjson_to_map<sstring>(client.get("/storage_service/host_id"));
...
    const auto tokens_endpoint_res = client.get("/storage_service/tokens_endpoint", std::move(tokens_endpoint_params));

then it iterates over endpoints found in tokens_endpoint_res, and for each, finds the host ID based on endpoint_host_id to print it in status (and if it doesn't find the ID, it prints ?)

So if IP change notification is handled in-between, then tokens_endpoint_res will contain the new IP, but endpoint_host_id will not contain host ID for that IP.

Curiously it seems that old nodetool has the reverse problem:

        Map<String, String> tokensToEndpoints = probe.getTokenToEndpointMap();
        liveNodes = probe.getLiveNodes();
        unreachableNodes = probe.getUnreachableNodes();
        hostIDMap = probe.getHostIdMap();

so it would sometimes observe the old IP without host ID assigned. But the way the test logic is written, it won't catch this scenario, because it only looks for new IP, and if new IP is found, then it verifies its ID:

        hostid3 = node3.hostid()
        # Verify all nodes in the cluster see node3 is using the new ip address
        for node in cluster.nodelist():
            status = nodetool_status(node)
            logger.debug(f"nodetool status from {node.name}: {status}")
            for n in status["nodes"]:
                if n["address"] == ip3:
                    assert n["host id"] == hostid3

So summarizing, this failure is a small artifact of then new nodetool implementation, but old nodetool had a "reverse" version of this bug, the test just wouldn't catch it due to how it's written.

@kbr-scylla kbr-scylla added P4 Low Priority P3 Medium Priority and removed P4 Low Priority labels May 22, 2024
@kbr-scylla
Copy link
Contributor

Can be easily "fixed" (in reality, masked, just to shut the test up) by reversing the order of rest endpoint calls so it's the same as in old nodetool. @denesb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nodetool good first issue P3 Medium Priority symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Projects
None yet
Development

No branches or pull requests

3 participants