sched: Introduce Bound Multi-Processing (BMP) into NuttX #12020

anchao · 2024-03-28T11:14:43Z

Summary

sched: Introduce Bound Multi-Processing (BMP) into NuttX

Bound multiprocessing provides the scheduling control of an asymmetric
multiprocessing model, while preserving the hardware abstraction and
management of symmetric multiprocessing.

BMP is similar to SMP, but you can specify which processors a thread
can run on. You can use both SMP and BMP on the same system, allowing
some threads to migrate from one processor to another, while other
threads are restricted to one or more processors.

As with SMP, a single copy of the OS maintains an overall view of all
system resources, allowing them to be dynamically allocated and shared
among applications. But, during application initialization, a setting
determined by the system designer forces all of an application's threads
to execute only on a specified CPU.

Compared to full, floating SMP operation, this approach offers several
advantages:

It eliminates the cache thrashing that can reduce performance in an SMP
system by allowing applications that share the same data set to run
exclusively on the same CPU.
It offers simpler application debugging than SMP since all execution
threads within an application run on a single CPU.
It helps legacy applications that use poor techniques for synchronizing
shared data to run correctly, again by letting them run on a single CPU.

Bound Multi-Processing (BMP):

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |  <- Application bound to CPU
---------------------------------------------
|  Data[0] |  Data[1] |  Data[2] |  Data[3] |  <- NuttX Kernel Data supports multiple CPU instances
---------------------------------------------
|                Share Code                 |  <- NuttX kernel code shared for all CPUs
---------------------------------------------
|   UART 0 |   SPI 0  |   SPI 1  |   I2C 0  |  <- Driver is only registered to CPUs with application needs
---------------------------------------------
|  TIME 0  |  TIME 1  |  TIME 2  |  TIME 3  |  <- Core/CPU timers
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |  <- CPUs run independently
---------------------------------------------

Some subsystem data does not need to be duplicated, especially the components bound to the application.
For shared hardware devices, Use spinlock to avoid race-condition for multi-core.

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |
---------------------------------------------
| NetStack |  BTStack |  AUDIO   |   ...    |  <- Components bound to the application, data no need to duplicate.
---------------------------------------------
|                Share Code                 |
---------------------------------------------
|      Share UART (Protected by Spinlock)   |  <- Driver shared for all CPUS will protected by spinlock(e.g print logs)
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |
---------------------------------------------

Reference:
https://www.ghs.com/products/safety_critical/integrity_178_multicore.html
https://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.sys_arch/topic/smp_BMP.html
https://www.nxp.com.cn/docs/en/brochure/PWRARBYNDBITSRAS.pdf

Signed-off-by: chao an anchao@lixiang.com

Impact

Depends on: apache/nuttx-apps#2342

N/A

Testing

qemu-armv7a/bmp ostest on single core

nuttx$ qemu-system-arm -cpu cortex-a7 -nographic      -machine virt,virtualization=off,gic-version=2 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx -smp 4 

NuttShell (N
NuttS
Nutt
NH) NutShellShetSutlltX- (NSH) NuttX-10.4.0
4t(NheSH) NuttX-ns0.h> 1ll (NSH) Nu0.4.0
nstX-1h> 
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000536  13.1%  CPU0 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f51c 0x4013f530
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main
nsh> irqaff 33 1
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU1 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f544 0x4013f558
    2     2 100 RR       Task      - Running            0000000000000000 004056 001288  31.7%  nsh_main
nsh> irqaff 33 2
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU2 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f56c 0x4013f580
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main
nsh> irqaff 33 3
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU3 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f594 0x4013f5a8
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main

boards/boardctl.c

boards/Kconfig

xiaoxiang781216 · 2024-03-29T05:29:59Z

@anchao could you split irqaff change to a new pr? So the change crossing apps/nuttx could be merged first. Since the remaining change touch many files, it's better to ensure it can pass ci standalone.

acassis

Please move the detailed commit message to an entry at https://nuttx.apache.org/docs/latest/components/index.html

sched/timer/timer_initialize.c

sched/signal/sig_action.c

sched/Kconfig

include/nuttx/compiler.h

sched/Kconfig

arch/arm/src/qemu/qemu_boot.c

sched/Kconfig

arch/arm/src/common/arm_allocateheap.c

arch/arm/src/qemu/qemu_boot.c

include/nuttx/compiler.h

anjiahao1 · 2024-04-02T04:15:10Z

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

anchao · 2024-04-02T07:33:07Z

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.

anjiahao1 · 2024-04-02T07:35:30Z

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.

Great!

PetervdPerk-NXP · 2024-04-02T21:42:26Z

Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?

anchao · 2024-04-03T00:48:12Z

Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?

Yes, but the implementation requires more customized modifications. If platforms without hardware cache consistency, all data must be correctly placed on cache line aligned sections, which will depend on some labeling for specific data/bss in the link script.

zouboan · 2024-04-06T03:57:52Z

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that
processor with this approach？

anchao · 2024-04-09T01:29:22Z

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach？

Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.

zouboan · 2024-04-09T12:18:02Z

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach？

Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.

i see，it seem's that BMP can resolve two problems of SMP processor affinity: constraining threads in third-party code, and constraining dynamically created threads

Bound multiprocessing provides the scheduling control of an asymmetric multiprocessing model, while preserving the hardware abstraction and management of symmetric multiprocessing. BMP is similar to SMP, but you can specify which processors a thread can run on. You can use both SMP and BMP on the same system, allowing some threads to migrate from one processor to another, while other threads are restricted to one or more processors. As with SMP, a single copy of the OS maintains an overall view of all system resources, allowing them to be dynamically allocated and shared among applications. But, during application initialization, a setting determined by the system designer forces all of an application's threads to execute only on a specified CPU. Compared to full, floating SMP operation, this approach offers several advantages: 1. It eliminates the cache thrashing that can reduce performance in an SMP system by allowing applications that share the same data set to run exclusively on the same CPU. 2. It offers simpler application debugging than SMP since all execution threads within an application run on a single CPU. 3. It helps legacy applications that use poor techniques for synchronizing shared data to run correctly, again by letting them run on a single CPU. Bound Multi-Processing (BMP): --------------------------------------------- | APP 0 | APP 1 | APP 2 | APP 3 | <- Application bound to CPU --------------------------------------------- | Data[0] | Data[1] | Data[2] | Data[3] | <- NuttX Kernel Data supports multiple CPU instances --------------------------------------------- | Share Code | <- NuttX kernel code shared for all CPUs --------------------------------------------- | UART 0 | SPI 0 | SPI 1 | I2C 0 | <- Driver is only registered to CPUs with application needs --------------------------------------------- | TIME 0 | TIME 1 | TIME 2 | TIME 3 | <- Core/CPU timers --------------------------------------------- | CPU0 | CPU1 | CPU2 | CPU3 | <- CPUs run independently --------------------------------------------- Some subsystem data does not need to be duplicated, especially the components bound to the application. For shared hardware devices, Use spinlock to avoid race-condition for multi-core. --------------------------------------------- | APP 0 | APP 1 | APP 2 | APP 3 | --------------------------------------------- | NetStack | BTStack | AUDIO | ... | <- Components bound to the application, data no need to duplicate. --------------------------------------------- | Share Code | --------------------------------------------- | Share UART (Protected by Spinlock) | <- Driver shared for all CPUS will protected by spinlock(e.g print logs) --------------------------------------------- | CPU0 | CPU1 | CPU2 | CPU3 | --------------------------------------------- Signed-off-by: chao an <anchao@lixiang.com> Signed-off-by: chao an <anchao@lixiang.com>

Signed-off-by: chao an <anchao@lixiang.com>

anchao mentioned this pull request Mar 28, 2024

nshlib/irqaff: add irq affinity command apache/nuttx-apps#2342

Merged

anchao force-pushed the 24032802 branch 4 times, most recently from e1ddca8 to 0802894 Compare March 28, 2024 11:32

xiaoxiang781216 reviewed Mar 28, 2024

View reviewed changes

boards/boardctl.c Outdated Show resolved Hide resolved

boards/Kconfig Outdated Show resolved Hide resolved

boards/Kconfig Outdated Show resolved Hide resolved

anchao force-pushed the 24032802 branch 9 times, most recently from 38887e3 to 016201d Compare March 29, 2024 04:53

xiaoxiang781216 requested review from patacongo, pkarashchenko, acassis and davids5 March 29, 2024 05:23

anchao force-pushed the 24032802 branch from 016201d to 1a0bf56 Compare March 29, 2024 05:46

anchao marked this pull request as draft March 29, 2024 07:55

anchao force-pushed the 24032802 branch from 1a0bf56 to 45cde4c Compare March 29, 2024 10:53

anchao marked this pull request as ready for review March 29, 2024 10:54

anchao force-pushed the 24032802 branch 4 times, most recently from eb003ee to cde829d Compare March 29, 2024 11:34

acassis reviewed Mar 29, 2024

View reviewed changes

xiaoxiang781216 reviewed Mar 31, 2024

View reviewed changes

hartmannathan reviewed Mar 31, 2024

View reviewed changes

anchao force-pushed the 24032802 branch 6 times, most recently from 27e2459 to a64a1e7 Compare April 2, 2024 03:01

anchao force-pushed the 24032802 branch from a64a1e7 to 370b779 Compare April 2, 2024 08:30

acassis mentioned this pull request Apr 6, 2024

arch/tricore: add Bound Multi-Processing (BMP) support #12032

Draft

anchao force-pushed the 24032802 branch from 370b779 to 19a9c8e Compare April 11, 2024 04:42

anchao added 2 commits April 11, 2024 14:15

armv7-a/qemu: add Bound Multi-Processing (BMP) support

7a0b2de

Signed-off-by: chao an <anchao@lixiang.com>

anchao force-pushed the 24032802 branch from 19a9c8e to 7a0b2de Compare April 11, 2024 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sched: Introduce Bound Multi-Processing (BMP) into NuttX #12020

sched: Introduce Bound Multi-Processing (BMP) into NuttX #12020

anchao commented Mar 28, 2024 •

edited

xiaoxiang781216 commented Mar 29, 2024

acassis left a comment

anjiahao1 commented Apr 2, 2024

anchao commented Apr 2, 2024

anjiahao1 commented Apr 2, 2024

PetervdPerk-NXP commented Apr 2, 2024

anchao commented Apr 3, 2024

zouboan commented Apr 6, 2024

anchao commented Apr 9, 2024

zouboan commented Apr 9, 2024

sched: Introduce Bound Multi-Processing (BMP) into NuttX #12020

Are you sure you want to change the base?

sched: Introduce Bound Multi-Processing (BMP) into NuttX #12020

Conversation

anchao commented Mar 28, 2024 • edited

Summary

Impact

Testing

xiaoxiang781216 commented Mar 29, 2024

acassis left a comment

Choose a reason for hiding this comment

anjiahao1 commented Apr 2, 2024

anchao commented Apr 2, 2024

anjiahao1 commented Apr 2, 2024

PetervdPerk-NXP commented Apr 2, 2024

anchao commented Apr 3, 2024

zouboan commented Apr 6, 2024

anchao commented Apr 9, 2024

zouboan commented Apr 9, 2024

anchao commented Mar 28, 2024 •

edited