-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mv: gossip the same backlog if a different backlog was sent in a response #18663
base: master
Are you sure you want to change the base?
Conversation
This patch only concerns the last commit. Is based on #18646 to avoid later conflicts |
🔴 CI State: FAILURE❌ - Build Build Details:
|
Rebased on updated #18646 and fixed compilation error |
🔴 CI State: FAILURE❌ - Build Build Failure:
Build Details:
|
6c91864
to
33f4a75
Compare
🟢 CI State: SUCCESS✅ - Build Build Details:
|
db/view/node_view_update_backlog.hh
Outdated
@@ -37,16 +39,20 @@ class node_update_backlog { | |||
std::chrono::milliseconds _interval; | |||
std::atomic<clock::time_point> _last_update; | |||
std::atomic<update_backlog> _max; | |||
std::vector<std::atomic<need_publishing>> _need_publishing; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need_publishing
flags will be put very close together. Even though each one will be touched 99% of the time by a single shard, due to the fact that they will most likely land in a single cache performance will be affected due to false sharing.
It would be much better to put this into the per_shard_backlog
, right after the backlog
field. The alignment on backlog
will prevent the false sharing issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, I'm not sure these have to be atomics. You could change add_fetch_for_gossip
to perform invoke_on_all
to gather values on all shards. It only runs once a second, so it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
service/storage_proxy.cc
Outdated
std::optional<db::view::update_backlog> storage_proxy::get_view_update_backlog_for_gossip(std::optional<db::view::update_backlog> last_backlog) const { | ||
return _max_view_update_backlog.add_fetch_for_gossip(this_shard_id(), get_db().local().get_view_update_backlog(), std::move(last_backlog)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gossiper runs on shard 0, so the shard parameter is unneeded. It would make more sense to check the shard and do on_internal_error is it is not 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
db/view/view.cc
Outdated
} | ||
need_publishing = np.exchange(need_publishing::no, std::memory_order_relaxed); | ||
} | ||
auto max = add_fetch(shard, backlog); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that the previous code used to do this, but do we really have to? Wouldn't be sufficient for the gossiper fiber to only gather data and not update the local shard's data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
service/storage_proxy.hh
Outdated
db::view::update_backlog get_view_update_backlog() const; | ||
db::view::update_backlog get_view_update_backlog_for_response() const; | ||
|
||
std::optional<db::view::update_backlog> get_view_update_backlog_for_gossip(std::optional<db::view::update_backlog> last_backlog) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like these names because they don't really tell much about what those functions do. It's not obvious what are the requirements of "response" or "gossip".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the names a bit
db/view/view.cc
Outdated
if (!need_publishing && last_backlog && *last_backlog == max) { | ||
return std::nullopt; | ||
} else { | ||
return max; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to check agains the last backlog at all? If the value has changed then one of the need_publishing
flags must have been updated, no?
Your description is very information-dense and thus difficult to parse. In such cases, an example usually helps in understanding the bug. The example that you already provided in the description of the issue was, in my opinion, quite clear, so you could write an example here based on that. |
I've mostly copied the example, hopefully that's good enough. Also I rebased to fix the merge conflicts. At the same time, I added a commit for better readability to this PR so the patch is now the last 2 commits, not only the last one |
🔴 CI State: FAILURE❌ - Build Build Details:
|
🔴 CI State: FAILURE❌ - Build Build Failure:
Build Details:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description looks better, thanks.
I reviewed the last commit. I left some comments. Please fix the nits, the compilation error and, if possible, rebase on master so that this commit does not require #18646.
Otherwise, looks good.
db/view/view.cc
Outdated
}); | ||
|
||
auto max = fetch(); | ||
if (!np && last_backlog && *last_backlog == max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked about this before, but the comment was resolved.
Isn't it sufficient to just check:
if (!np) {
?
Previously, we used to compare the last backlog vs the current one in order to decide whether it needs publishing. Now, we have the need_publishing
flag. Is the last_backlog && *last_backlog == max
still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole if..else could be written as a ternary operator:
return np ? max : std::nullopt;
I didn't try to compile it, there is a chance that you'll get a type error because both branches don't have the same type, however it would be nice to try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked about this before, but the comment was resolved.
Isn't it sufficient to just check:
if (!np) {
?
Previously, we used to compare the last backlog vs the current one in order to decide whether it needs publishing. Now, we have the
need_publishing
flag. Is thelast_backlog && *last_backlog == max
still necessary?
After #18804 is merged and the need_publishing
is also updated in that context, the last_backlog && *last_backlog == max
check won't be needed anymore. Without #18804 the need_publishing
flag is not set when a backlog drops, so we have no other way of detecting it other than comparing with the last sent backlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this in the last rebase
db/view/view.cc
Outdated
if (_backlogs[shard].need_publishing) { | ||
_backlogs[shard].need_publishing = need_publishing::no; | ||
return need_publishing::yes; | ||
} | ||
return need_publishing::no; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can use std::exchange
to make the body of the lambda nice and short:
return std::exchange(_backlogs[shard].need_publishing, need_publishing::no);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
service/storage_proxy.hh
Outdated
@@ -235,6 +235,8 @@ public: | |||
// Get information about this node's view update backlog. It combines information from all local shards. | |||
db::view::update_backlog get_view_update_backlog(); | |||
|
|||
future<std::optional<db::view::update_backlog>> get_view_update_backlog_if_changed(std::optional<db::view::update_backlog> last_backlog); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you require that this function must be called on shard 0 only. Please put this information in a comment near the signature so that the future users are not surprised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
db/view/view.cc
Outdated
return a || b; | ||
}); | ||
|
||
auto max = fetch(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be prone to races. Consider the following scenario:
_backlog
is updated on shard 1, theneed_publishing
flag is set,- The
_last_update
is recalculated, next one will happen 10ms from now or later, - 5ms pass,
fetch_if_changed
and clears theneed_publishing
flag on shard 1, but usesfetch
to read the old value,- There are no MV writes anymore and we didn't publish the updated backlog on shard 1.
You should collect the backlogs in the submit_to
lambda and calculate the maximum in the reducer lambda (you should still collect the np flag in addition to this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in the rebase. This method now updates the max backlog unconditionally (disregarding the 10ms
interval - it's only run once every second anyway), and returns the new max only if the need_publishing
flag was set somewhere
094aff0
to
266606f
Compare
I rebased on master and fixed the last few comments in the last rebases |
🔴 CI State: FAILURE❌ - Build Build Failure:
Build Details:
|
🟢 CI State: SUCCESS✅ - Build Build Details:
|
Currently, we only update the backlogs in node_update_backlog at the same time when we're fetching them. This is done using storage_proxy's method get_view_update_backlog, which is confusing because it's a getter with side-effects. Additionally, we don't always want to update the backlog when we're reading it (as in gossip which is only on shard 0) and we don't always want to read it when we're updating it (when we're not handling any writes but the backlog drops due to background work finish). This patch divides the node_view_backlog::add_fetch as well the storage_proxy::get_view_update_backlog both into two methods; one for updating and one for reading the backlog. This patch only replaces the places where we're currently using the view backlog getter, more situations where we should get/update the backlog should be considered in a following patch.
…onse Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog. This patch changes this by notifying the gossip that a the backlog changed since the last gossip round so a different backlog could have been send through the response piggyback mechanism. With that information, gossip will send an unchanged backlog to other nodes in the following gossip round. Fixes: scylladb#18461
Currently, there are 2 ways of sharing a backlog with other nodes: through
a gossip mechanism, and with responses to replica writes. In gossip, we
check each second if the backlog changed, and if it did we update other
nodes with it. However if the backlog for this node changed on another
node with a write response, the gossiped backlog is currently not updated,
so if after the response the backlog goes back to the value from the previous
gossip round, it will not get sent and the other node will stay with an
outdated backlog - this can be observed in the following scenario:
view_update_backlog_broker
(the backlog gossiper) performs an iteration of its backlog update loop, sees no change (backlog has been empty since the start), schedules the next iteration after 1sview_update_backlog_broker
on N, backlog is empty, as it was in step 2, so no change is seen and no update is sent due to the checkAfter this scenario happens, the coordinator keeps an information about an increased view update backlog on N even though it's actually already empty
This patch fixes the issue this by notifying the gossip that a different backlog
was sent in a response, causing it to send an unchanged backlog to other
nodes in the following gossip round.
Fixes: #18461
Similarly to #18646, without admission control (#18334), this patch doesn't affect much, so I'm marking it as backport/none
Tests: manual. Currently this patch only affects the length of MV flow control delay, which is not reliable to base a test on. A proper test will be added when MV admission control is added, so we'll be able to base the test on rejected requests