-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
main: introduce schema commitlog scheduling group #18715
Conversation
Currently, we do not explicitly set a scheduling group for the schema commitlog which causes it to run in the default scheduling group (called "main"). However: - It is important and significant enough that it should run in a scheduling group that is separate from the main one, - It should not run in the existing "commitlog" group as user writes may sometimes need to wait for schema commitlog writes (e.g. read barrier done to learn the schema necessary to interpret the user write) and we want to avoid priority inversion issues. Therefore, introduce a new scheduling group dedicated to the schema commitlog. Fixes: scylladb#15566
I ran the following script: from cassandra.cluster import Cluster
c = Cluster()
s = c.connect()
while True:
s.execute("CREATE KEYSPACE IF NOT EXISTS ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}")
s.execute("DROP KEYSPACE ks") and was looking whether the I added an error injection which aborts if seastar tries to process a write in the |
@@ -1024,6 +1024,7 @@ To start the scylla server proper, simply invoke as: scylla server (or just scyl | |||
dbcfg.memtable_to_cache_scheduling_group = make_sched_group("memtable_to_cache", "mt2c", 200); | |||
dbcfg.gossip_scheduling_group = make_sched_group("gossip", "gms", 1000); | |||
dbcfg.commitlog_scheduling_group = make_sched_group("commitlog", "clog", 1000); | |||
dbcfg.schema_commitlog_scheduling_group = make_sched_group("schema_commitlog", "sclg", 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sclog or schema_clog perhaps for consistency with the prevoius name (stretching my intelligence trying to contribute to this patch to the extreme)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to use "sclog", but there is a hard-coded 4-character limit on the size of the short name of the scheduling group.
We don't force-flush the sstables any more when changing schema tables. Are there any flushes in the log? |
I think it's too minor to have its own scheduling group. What about gossip? It's for "related" work. |
🟢 CI State: SUCCESS✅ - Build Build Details:
|
@avikivity the goal here is to avoid priority inversion since schema commit log is used during raft read barrier, which is in turn used when performing fencing on the data write path. |
It doesn't look like we are doing explicit flushes. Flushes don't happen every time keyspace is created or added, but around once in 10 iterations of the script loop. They seem to be triggered by the |
Perhaps the settings of hte default dirty memory manager, which is used for system tables, could be tweaked now to increasae the limits, since we no longer need to flush the memtables at every update to the system tables, so don't need to make the settings so tight. system.paxos is already using user memory (check out maybe_write_in_user_memory()), but that's another extreme. |
I don't object to moving schema commitlog to another scheduling group. It's a good thing to do. I want to see if we can reuse another low-throughput group instead of adding a new one. |
@avikivity I must admit that I am a bit confused on how we choose scheduling groups for particular workloads. We have named groups for dedicated workloads: "gossip" for gossip, "commitlog" for the (non-schema) commitlog, etc. I know that we put background work into "streaming", and we also have "main" for unclassified stuff. Why would schema commitlog belong to gossip? How should one think about choosing the scheduling group? |
gossip is administrative work that is as important as user workloads. Hence gossip sg also has 1000 shares (just like the statement sg). |
If we want to put schema commitlog and gossip to the same group then the group should not be named "gossip" but perhaps something like "administrative" or "internal" or whatever. Anyway I already queued the patch. I think it is better than leaving schema commitlog in "main" group. If we want to merge schema group with gossip group then it can be done in follow-up patch. |
We don't have a secret procedure for assigning scheduling groups. I agree that placing the schema commitlog in gossip is incongruous. We could rename it to "metadata", and move all of group0 there. After all, it's wrong that DDL statements or topology movements compete with repair. We may want to keep the actual tablet load balancer calculations in maintenance, since they can be heavy and nothing depends on them. |
Thank you for reminding me that queries done by group 0 state machine should be moved out of "statement" scheduling group, it should be a GH issue but I keep forgetting to open it |
Statement verbs support tenants -- it can switch between a predefined set of scheduling groups. Currently there is 2: statement and default (system). There is an open PR for a third: streaming. |
Currently, we do not explicitly set a scheduling group for the schema commitlog which causes it to run in the default scheduling group (called "main"). However:
Therefore, introduce a new scheduling group dedicated to the schema commitlog.
Fixes: #15566