Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched: Introduce Bound Multi-Processing (BMP) into NuttX #12020

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

anchao
Copy link
Contributor

@anchao anchao commented Mar 28, 2024

Summary

sched: Introduce Bound Multi-Processing (BMP) into NuttX

Bound multiprocessing provides the scheduling control of an asymmetric
multiprocessing model, while preserving the hardware abstraction and
management of symmetric multiprocessing.

BMP is similar to SMP, but you can specify which processors a thread
can run on. You can use both SMP and BMP on the same system, allowing
some threads to migrate from one processor to another, while other
threads are restricted to one or more processors.

As with SMP, a single copy of the OS maintains an overall view of all
system resources, allowing them to be dynamically allocated and shared
among applications. But, during application initialization, a setting
determined by the system designer forces all of an application's threads
to execute only on a specified CPU.

Compared to full, floating SMP operation, this approach offers several
advantages:

  1. It eliminates the cache thrashing that can reduce performance in an SMP
    system by allowing applications that share the same data set to run
    exclusively on the same CPU.
  2. It offers simpler application debugging than SMP since all execution
    threads within an application run on a single CPU.
  3. It helps legacy applications that use poor techniques for synchronizing
    shared data to run correctly, again by letting them run on a single CPU.

Bound Multi-Processing (BMP):

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |  <- Application bound to CPU
---------------------------------------------
|  Data[0] |  Data[1] |  Data[2] |  Data[3] |  <- NuttX Kernel Data supports multiple CPU instances
---------------------------------------------
|                Share Code                 |  <- NuttX kernel code shared for all CPUs
---------------------------------------------
|   UART 0 |   SPI 0  |   SPI 1  |   I2C 0  |  <- Driver is only registered to CPUs with application needs
---------------------------------------------
|  TIME 0  |  TIME 1  |  TIME 2  |  TIME 3  |  <- Core/CPU timers
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |  <- CPUs run independently
---------------------------------------------

Some subsystem data does not need to be duplicated, especially the components bound to the application.
For shared hardware devices, Use spinlock to avoid race-condition for multi-core.

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |
---------------------------------------------
| NetStack |  BTStack |  AUDIO   |   ...    |  <- Components bound to the application, data no need to duplicate.
---------------------------------------------
|                Share Code                 |
---------------------------------------------
|      Share UART (Protected by Spinlock)   |  <- Driver shared for all CPUS will protected by spinlock(e.g print logs)
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |
---------------------------------------------

Reference:
https://www.ghs.com/products/safety_critical/integrity_178_multicore.html
https://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.sys_arch/topic/smp_BMP.html
https://www.nxp.com.cn/docs/en/brochure/PWRARBYNDBITSRAS.pdf

Signed-off-by: chao an anchao@lixiang.com

Impact

Depends on: apache/nuttx-apps#2342

N/A

Testing

qemu-armv7a/bmp ostest on single core

nuttx$ qemu-system-arm -cpu cortex-a7 -nographic      -machine virt,virtualization=off,gic-version=2 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx -smp 4 

NuttShell (N
NuttS
Nutt
NH) NutShellShetSutlltX- (NSH) NuttX-10.4.0
4t(NheSH) NuttX-ns0.h> 1ll (NSH) Nu0.4.0
nstX-1h> 
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000536  13.1%  CPU0 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f51c 0x4013f530
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main
nsh> irqaff 33 1
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU1 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f544 0x4013f558
    2     2 100 RR       Task      - Running            0000000000000000 004056 001288  31.7%  nsh_main
nsh> irqaff 33 2
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU2 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f56c 0x4013f580
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main
nsh> irqaff 33 3
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU3 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f594 0x4013f5a8
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main

boards/boardctl.c Outdated Show resolved Hide resolved
boards/Kconfig Outdated Show resolved Hide resolved
boards/Kconfig Outdated Show resolved Hide resolved
@xiaoxiang781216
Copy link
Contributor

@anchao could you split irqaff change to a new pr? So the change crossing apps/nuttx could be merged first. Since the remaining change touch many files, it's better to ensure it can pass ci standalone.

@anchao anchao marked this pull request as draft March 29, 2024 07:55
@anchao anchao marked this pull request as ready for review March 29, 2024 10:54
@anchao anchao force-pushed the 24032802 branch 4 times, most recently from eb003ee to cde829d Compare March 29, 2024 11:34
Copy link
Contributor

@acassis acassis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the detailed commit message to an entry at https://nuttx.apache.org/docs/latest/components/index.html

sched/timer/timer_initialize.c Outdated Show resolved Hide resolved
sched/signal/sig_action.c Outdated Show resolved Hide resolved
sched/Kconfig Outdated Show resolved Hide resolved
include/nuttx/compiler.h Outdated Show resolved Hide resolved
include/nuttx/compiler.h Show resolved Hide resolved
sched/Kconfig Show resolved Hide resolved
sched/Kconfig Show resolved Hide resolved
arch/arm/src/qemu/qemu_boot.c Outdated Show resolved Hide resolved
arch/arm/src/qemu/qemu_boot.c Outdated Show resolved Hide resolved
sched/Kconfig Show resolved Hide resolved
arch/arm/src/common/arm_allocateheap.c Outdated Show resolved Hide resolved
arch/arm/src/qemu/qemu_boot.c Outdated Show resolved Hide resolved
arch/arm/src/qemu/qemu_boot.c Outdated Show resolved Hide resolved
include/nuttx/compiler.h Outdated Show resolved Hide resolved
@anchao anchao force-pushed the 24032802 branch 6 times, most recently from 27e2459 to a64a1e7 Compare April 2, 2024 03:01
@anjiahao1
Copy link
Contributor

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

@anchao
Copy link
Contributor Author

anchao commented Apr 2, 2024

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.

@anjiahao1
Copy link
Contributor

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.

Great!

@PetervdPerk-NXP
Copy link
Contributor

Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?

@anchao
Copy link
Contributor Author

anchao commented Apr 3, 2024

Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?

Yes, but the implementation requires more customized modifications. If platforms without hardware cache consistency, all data must be correctly placed on cache line aligned sections, which will depend on some labeling for specific data/bss in the link script.

@zouboan
Copy link
Contributor

zouboan commented Apr 6, 2024

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that
processor with this approach?

@anchao
Copy link
Contributor Author

anchao commented Apr 9, 2024

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?

Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.

@zouboan
Copy link
Contributor

zouboan commented Apr 9, 2024

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?

Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.

i see,it seem's that BMP can resolve two problems of SMP processor affinity: constraining threads in third-party code, and constraining dynamically created threads

Bound multiprocessing provides the scheduling control of an asymmetric
multiprocessing model, while preserving the hardware abstraction and
management of symmetric multiprocessing.

BMP is similar to SMP, but you can specify which processors a thread
can run on. You can use both SMP and BMP on the same system, allowing
some threads to migrate from one processor to another, while other
threads are restricted to one or more processors.

As with SMP, a single copy of the OS maintains an overall view of all
system resources, allowing them to be dynamically allocated and shared
among applications. But, during application initialization, a setting
determined by the system designer forces all of an application's threads
to execute only on a specified CPU.

Compared to full, floating SMP operation, this approach offers several
advantages:

1. It eliminates the cache thrashing that can reduce performance in an SMP
  system by allowing applications that share the same data set to run
  exclusively on the same CPU.
2. It offers simpler application debugging than SMP since all execution
  threads within an application run on a single CPU.
3. It helps legacy applications that use poor techniques for synchronizing
  shared data to run correctly, again by letting them run on a single CPU.

Bound Multi-Processing (BMP):

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |  <- Application bound to CPU
---------------------------------------------
|  Data[0] |  Data[1] |  Data[2] |  Data[3] |  <- NuttX Kernel Data supports multiple CPU instances
---------------------------------------------
|                Share Code                 |  <- NuttX kernel code shared for all CPUs
---------------------------------------------
|   UART 0 |   SPI 0  |   SPI 1  |   I2C 0  |  <- Driver is only registered to CPUs with application needs
---------------------------------------------
|  TIME 0  |  TIME 1  |  TIME 2  |  TIME 3  |  <- Core/CPU timers
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |  <- CPUs run independently
---------------------------------------------

Some subsystem data does not need to be duplicated, especially the components bound to the application.
For shared hardware devices, Use spinlock to avoid race-condition for multi-core.

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |
---------------------------------------------
| NetStack |  BTStack |  AUDIO   |   ...    |  <- Components bound to the application, data no need to duplicate.
---------------------------------------------
|                Share Code                 |
---------------------------------------------
|      Share UART (Protected by Spinlock)   |  <- Driver shared for all CPUS will protected by spinlock(e.g print logs)
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |
---------------------------------------------

Signed-off-by: chao an <anchao@lixiang.com>

Signed-off-by: chao an <anchao@lixiang.com>
Signed-off-by: chao an <anchao@lixiang.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants