Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched/tcb: use shared group for kthreads #12320

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

yf13
Copy link
Contributor

@yf13 yf13 commented May 10, 2024

Summary

Current task group data is embedded in the task TCB, this is also used for kernel threads, which seems can share the group as they all live in kernel space.

This patch uses tcb_s for kthreads with a shared group instance. Thus the overhead is reduced when more kthreads are in use.

Memory usage of group instance, i.e. sizeof(task_group_s), on rv-virt device:

config size
nsh 720
nsh64 1160
knsh32 248
knsh64 440

Kernel memory footprints collected from attached logs:

config kthreads # upstream patched saved
nsh 1 7044 7044 0
nsh64 1 10344 10344 0
knsh32 2 10348 9660 588
knsh64 2 13304 12168 1136

The savings are beyond sizeof(task_group_s) due to subordinate data of groups.

Impact

This may affect all configs.

Testing

  • QEMU 6.2 on Ubuntu 22.04 with configs nsh, nsh64, knsh32 and knsh64. See logs-12320.tar.gz for more details:
    • the embedded folder has logs from upstream version.
    • the kthreads folder has logs from patched version.
  • Github CI checks

sched/group/group_create.c Outdated Show resolved Hide resolved
sched/group/group_create.c Outdated Show resolved Hide resolved
@yf13
Copy link
Contributor Author

yf13 commented May 10, 2024

@anchao, let's discuss here as these comments are visible in gh tool as well.

Can you teach your opinion about the MM_TLSF_MANAGER? It seems having O(1) alloc() and works in flat build mode. I added *.tlsf.log files for your review.

@anchao
Copy link
Contributor

anchao commented May 11, 2024

@anchao, let's discuss here as these comments are visible in gh tool as well.

Can you teach your opinion about the MM_TLSF_MANAGER? It seems having O(1) alloc() and works in flat build mode. I added *.tlsf.log files for your review.

Although TLSF has more advantages than Best fit in terms of time complexity O(1), it will cause serious memory fragmentation issues in aging and persistence tests.

I have added an attachment containing the test results of different allocators (Sheet from @XuNeo Thanks), You can generate a chart and observe the memory trend (This chart may not show the worst case scenario of TLSF)
20221110_mem_test.xlsx

@yf13
Copy link
Contributor Author

yf13 commented May 11, 2024

@anchao, thanks for sharing the details.

Here I am trying to keep task_tcb_s for userspace tasks but use tcb_s for kthreads as suggested by @xiaoxiang781216 and it looks working for nsh64, I still check other configs. Is this approach fine for you?

@anchao
Copy link
Contributor

anchao commented May 11, 2024

@anchao, thanks for sharing the details.

Here I am trying to keep task_tcb_s for userspace tasks but use tcb_s for kthreads as suggested by @xiaoxiang781216 and it looks working for nsh64, I still check other configs. Is this approach fine for you?

Sounds good, could you update this PR, I'd like to review the code directly

@yf13 yf13 changed the title sched/tcb: drop task_tcb_s.group sched/tcb: reduce kthread overhead May 11, 2024
@yf13 yf13 force-pushed the sched branch 3 times, most recently from e93ddd2 to f443ffc Compare May 11, 2024 05:23
@yf13 yf13 force-pushed the sched branch 5 times, most recently from 1461cbe to c9a6669 Compare May 11, 2024 10:14
@acassis
Copy link
Contributor

acassis commented May 11, 2024

Please fix this failure:

In function 'nxthread_create',
    inlined from 'kthread_create_with_stack' at task/task_create.c:254:10:
Error: task/task_create.c:97:18: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
   97 |   tcb->cmn.flags = ttype | TCB_FLAG_FREE_TCB;
      |                  ^
In file included from task/task_create.c:33:
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
Error: task/task_create.c:111:7: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
  111 |   pid = tcb->cmn.pid;
      |   ~~~~^~~~~~~~~~~~~~
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
In function 'nxthread_create',
    inlined from 'kthread_create_with_stack' at task/task_create.c:254:10,
    inlined from 'kthread_create' at task/task_create.c:285:10:
Error: task/task_create.c:97:18: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
   97 |   tcb->cmn.flags = ttype | TCB_FLAG_FREE_TCB;
      |                  ^
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
Error: task/task_create.c:111:7: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
  111 |   pid = tcb->cmn.pid;
      |   ~~~~^~~~~~~~~~~~~~
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [Makefile:61: task_create.o] Error 1
make[1]: Target 'libsched.a' not remade because of errors.

@yf13
Copy link
Contributor Author

yf13 commented May 11, 2024

@acassis I am wondering that is this GCC 12 issue, just pushed a version using $progma to see if it can be fixed. Please teach if there are better fixes.

@yf13 yf13 force-pushed the sched branch 2 times, most recently from 229d4d6 to 69ba63d Compare May 11, 2024 12:20
@acassis
Copy link
Contributor

acassis commented May 11, 2024

@yf13 adding #pragma in the generic code is not a good idea, maybe you can try to fix the structure to avoid the issue, please see here: micropython/micropython#7064

He create a union to storage the right size

@yf13
Copy link
Contributor Author

yf13 commented May 11, 2024

@yf13 adding #pragma in the generic code is not a good idea, maybe you can try to fix the structure to avoid the issue, please see here: micropython/micropython#7064

@acassis thanks, will remove #pragma later after passing all checks.

Now case test_ltp_interfaces_lio_listio_2_1 still fails, need more investigation about it, hints are highly welcomed.

@yf13
Copy link
Contributor Author

yf13 commented May 22, 2024

@xiaoxiang781216 now the remaining issues are in this log:

Configuration/Tool: rv-virt/citest64
usrsocktest_basic_connect.c: In function 'teardown':
Error: usrsocktest_basic_connect.c:112:7: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]

and

Configuration/Tool: esp32c3-generic/twai esp32c6-devkitm/twai esp32c6-devkitc/twai  esp32h2-devkit/twai
Error: common/espressif/esp_twai.c:242:7: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]

patch #12387 has been created for the latter.

@xiaoxiang781216
Copy link
Contributor

@xiaoxiang781216 now the remaining issues are in this log:

Configuration/Tool: rv-virt/citest64
usrsocktest_basic_connect.c: In function 'teardown':
Error: usrsocktest_basic_connect.c:112:7: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]

and

Configuration/Tool: esp32c3-generic/twai esp32c6-devkitm/twai esp32c6-devkitc/twai  esp32h2-devkit/twai
Error: common/espressif/esp_twai.c:242:7: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]

patch #12387 has been created for the latter.

@yf13 thanks, see the comment in the pr.

@xiaoxiang781216
Copy link
Contributor

@xiaoxiang781216 now the remaining issues are in this log:

Configuration/Tool: rv-virt/citest64
usrsocktest_basic_connect.c: In function 'teardown':
Error: usrsocktest_basic_connect.c:112:7: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]

Fix here: apache/nuttx-apps#2403

@yf13
Copy link
Contributor Author

yf13 commented May 22, 2024

Removed temporary fixes from this patch

@yf13 yf13 changed the title sched/tcb: share group for kthreads sched/tcb: use shared group for kthreads May 22, 2024
Thread args have already been saved to stack after the TLS section by
nxtask_setup_stackargs. This is to retrieve it for use.

Signed-off-by: Yanfeng Liu <yfliu2008@qq.com>
@yf13 yf13 force-pushed the sched branch 3 times, most recently from f4ff342 to 75cafed Compare May 23, 2024 05:18
@yf13
Copy link
Contributor Author

yf13 commented May 23, 2024

@xiaoxiang781216 it looks that the LTP timeouts in sim/citest still happens, so I disabled them again to see if there are remaining issues. Then a few CI attempts failed due to connectivity issues encountered during CI. Then the usrsocktest issues come back again in my last run.

So looks like that I have to wait until CI is normal to trigger another check.

@xiaoxiang781216
Copy link
Contributor

@xiaoxiang781216 it looks that the LTP timeouts in sim/citest still happens, so I disabled them again to see if there are remaining issues. Then a few CI attempts failed due to connectivity issues encountered during CI. Then the usrsocktest issues come back again in my last run.

So looks like that I have to wait until CI is normal to trigger another check.

@Donny9 is looking this problem.

@yf13 yf13 force-pushed the sched branch 2 times, most recently from a7e0136 to 9868fb6 Compare May 29, 2024 11:07
@yf13
Copy link
Contributor Author

yf13 commented May 30, 2024

@anchao, @xiaoxiang781216 looks like the CI system is okay now. Please give this patch a review when free and let me know if there is anything I can do.

include/nuttx/tls.h Show resolved Hide resolved
sched/group/group_create.c Outdated Show resolved Hide resolved
sched/group/group_create.c Outdated Show resolved Hide resolved
sched/group/group_create.c Show resolved Hide resolved
sched/init/nx_start.c Outdated Show resolved Hide resolved
sched/task/task_create.c Outdated Show resolved Hide resolved
sched/task/task_create.c Outdated Show resolved Hide resolved
sched/task/task_start.c Outdated Show resolved Hide resolved
sched/task/task_setup.c Outdated Show resolved Hide resolved
sched/group/group_create.c Outdated Show resolved Hide resolved
sched/task/task_create.c Outdated Show resolved Hide resolved
@yf13 yf13 force-pushed the sched branch 2 times, most recently from 0ef90ca to 84d791e Compare June 5, 2024 00:28
Kthreads can share the group data so that to reduce overheads.
This implements shared kthread group via:

- use `tcb_s` instead of `task_tcb_s` for kthreads
- use `g_kthread_group` when creating kthreads
- use stackargs to start tasks and kthreads

see pull/12320 for test logs.

Signed-off-by: Yanfeng Liu <yfliu2008@qq.com>
@yf13
Copy link
Contributor Author

yf13 commented Jun 5, 2024

@xiaoxiang781216 looks like our CI got connectivity issue:

Configuration/Tool: ucans32k146/se05x,CONFIG_ARM_TOOLCHAIN_GNU_EABI
ccuurrll::  ((2288))  FFaaiilleedd  ttoo  cocnonnencet ctto  tgoi tghiutbh.ucbo.mc opmo rpto r4t4 3 4a4f3t earf t1e3r5 110365 2m4s4:  mCso:n nCeocntinoenc ttiimoend  toiumte

I forgot somewhere I saw other repositories using sub-module instead of download, not sure if it can reduce this kind of connectivity issues?

Looks like the Linux (sim-xxx) checks passed so our compilation warning fixes shall work now. I will trigger another check soon.

- replaces `ta_argv` with `stackargs`
- drops `ta_argv` from `task_info_s`
- drops `g_idleargv` and checks idle accordingly

Signed-off-by: Yanfeng Liu <yfliu2008@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants