Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a dedicated scheduling group for schema commitlog #15566

Closed
kbr-scylla opened this issue Sep 27, 2023 · 13 comments
Closed

Use a dedicated scheduling group for schema commitlog #15566

kbr-scylla opened this issue Sep 27, 2023 · 13 comments

Comments

@kbr-scylla
Copy link
Contributor

Schema commitlog uses the default scheduling group, main.
Maybe we should use a dedicated scheduling group for schema commitlog, with appropriately adjusted number of shares?

Also, what other tasks are running on main? (Maybe main is already exclusive to schema commitlog...)

cc @xemul @kostja @gleb-cloudius @tgrabiec @avikivity

@avikivity
Copy link
Member

main should be left alone so things we forget have somewhere to live.

I'm not thrilled about adding a new scheduling group but it's the right thing to do.

Better is to have nested groups (and so commitlog/data and commitlog/schema).

@xemul
Copy link
Contributor

xemul commented Sep 27, 2023

nested sched groups: scylladb/seastar#1070

@xemul
Copy link
Contributor

xemul commented Sep 27, 2023

Also, what other tasks are running on main?

First, extremely important start, stop and config-reread code all run in main sg :)
Second, recent logs contain current sched group "name" next to shard id, so it can be found out by grepping it

@kbr-scylla
Copy link
Contributor Author

@kostja please decide who should work on this -- storage group or your group?

@mykaul mykaul added this to the 6.0 milestone Oct 16, 2023
@kostja
Copy link
Contributor

kostja commented Oct 17, 2023

@kbr-scylla depends on how you suggest to fix it. Nested scheduling groups?

@kbr-scylla
Copy link
Contributor Author

Not necessarily, if it's too much effort we could simply have a separate normal scheduling group.

@kbr-scylla
Copy link
Contributor Author

@kostja please decide who should work on this -- storage group or your group?

@kostja ping

I think this should belong to Storage.

@kostja kostja assigned eliransin and unassigned kbr-scylla Dec 7, 2023
@kostja
Copy link
Contributor

kostja commented Dec 13, 2023

So this issue actually breaks down into 3 or 4 related items.

Let me first write what the goal of this issue is: we want to make sure that there is no priority inversion between the data path and the schema path. E.g. when a node needs to make a read barrier to fetch the latest schema, and this barrier needs to write schema mutations to the schema commit log, these writes are not queued behind data path writes to data commit log and compaction. At the end of the day it manifests with latency spikes in clusters which alter tables under heavy load (todo: we need a link to the original issue here).

In order to fix that we need to do multiple things:

  1. change the scheduling group for the commit log; this seems to be trivial, and can be done when creating the commit log in system_tables.cc
  2. We need to make sure the schema commit log is using a differetn IO priority class than the rest of the writes. This is not implemented, neither in the commit log API nor at the instantiation place.
  3. we need to retest to see if it helps.

@kbr-scylla
Copy link
Contributor Author

At the end of the day it manifests with latency spikes in clusters which alter tables under heavy load (todo: we need a link to the original issue here).

#15622

The issue is fixed.

Schema commit log is already using a separate scheduling group from regular commitlog -- the main scheduling group, but as @avikivity said, "main should be left alone so things we forget have somewhere to live."

The above issue was caused by a priority inversion as you described, but the problem was not commitlog as initially suspected, it was using the user data memory pool for Raft tables -- so schema writes were waiting for user memtable flushes.

I think there is no longer a concept of IO priority classes, they were merged with scheduling groups (?)

There really is nothing else to be done than creating a separate scheduling group for schema commitlog, I think.

@eliransin
Copy link
Contributor

I think there is no longer a concept of IO priority classes, they were merged with scheduling groups (?)

This is correct.

@kbr-scylla if the issue is fixed why don't we close this one?

@kbr-scylla
Copy link
Contributor Author

Sorry for confusion @eliransin I meant that #15622 is fixed (which was the cause of latency spikes that Kostja mentioned)

The current issue is not fixed, schema commitlog is in main group while (I believe) it should have its own

@mykaul
Copy link
Contributor

mykaul commented Mar 10, 2024

Will it make to 6.0?

@kostja kostja assigned piodul and unassigned eliransin Apr 7, 2024
@kostja kostja modified the milestones: 6.0, 6.1 Apr 7, 2024
@kostja kostja added area/schema changes area/commitlog Issues related to the commit log. labels May 2, 2024
@kostja
Copy link
Contributor

kostja commented May 2, 2024

See also #1926

piodul added a commit to piodul/scylla that referenced this issue May 16, 2024
Currently, we do not explicitly set a scheduling group for the schema
commitlog which causes it to run in the default scheduling group (called
"main"). However:

- It is important and significant enough that it should run in a
  scheduling group that is separate from the main one,
- It should not run in the existing "commitlog" group as user writes may
  sometimes need to wait for schema commitlog writes (e.g. read barrier
  done to learn the schema necessary to interpret the user write) and we
  want to avoid priority inversion issues.

Therefore, introduce a new scheduling group dedicated to the schema
commitlog.

Fixes: scylladb#15566
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants