arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864

lschuermann · 2024-02-17T23:01:33Z

Pull Request Overview

This is a second, slightly less elegant but working(?) stab at #3847.

Tock's core kernel design permits process implementations with system call interfaces other than the ones shipped upstream in the arch/cortex-m and arch/rv32i crates. However, in practice, the UserspaceKernelBoundary implementation for RV32I has always been tightly coupled to the generic kernel RISC-V trap handler. This has made it difficult to build an alternative process implementation (e.g., as part of Encapsulated Functions), while reusing much of Tock's RISC-V implementation.

By defining an "interface" that the global trap handler exposes (i.e., specifying how a custom trap handler can be registered, and which registers get clobbered), we allow foreign process implementations to hook into the rv32i arch crate's assembly, and can move all process-specific assembly into SysCall::switch_to_process. All process-specific logic is now contained in one assembly block that can be read top-to-bottom. The global trap handler no longer relies on the stack layout of that function, or other data structures in syscall.rs.

Because the kernel trap handler no longer contains any application logic and expects the app to set its own trap handler, implementing a new syscall ABI becomes as simple as temporarily swapping out the trap handler.

In-depth Walkthrough

Currently context switches on RISC-V work as follows:

For traps arriving in kernel mode, the kernel trap handler does an intricate dance to compare the mscratch value to 0 without clobbering any registers, and then takes a _from_kernel branch in _start_trap.
When switching to an app we disable interrupts and then swap the current kernel stack pointer into mscratch, making it non-zero. We re-enable interrupts when switching into user-mode.
When a trap arrives during app execution we perform the same dance as in 0., but then branch into _from_app.
The _from_app branch saves all of the application state through two levels of pointer indirection:
- reading the stack pointer from mscratch
- dereferencing a pointer on said stack to a struct compatible in layout to the Riscv32iStoredState
We then proceed to dump all registers into this state struct, and other core-local state (CSRs) onto known offsets from the stack pointer

This hard-codes many assumptions about Tock's process model, and may preclude implementations that want additional state stored, or do not want to buy into storing the whole register file (e.g., for single function invocations as with Encapsulated Functions, or cooperative userspace apps).
_from_app proceeds to switch to Machine mode.

This may be undesirable for process models which defer interrupt handling, e.g., for cooperative apps.
_from_app returns from the interrupt handler to an address stored at a well-known offset on the stack.

As is obvious from the above, using this trap handler has a significant amount of "buy in" concerning both the process execution semantics and the layout of stack pointer and the necessary Riscv32iStoredState.

I propose changing the routing of traps caught while executing processes by using mscratch not to branch between "kernel" and "app" trap handlers, but between "kernel" and "custom" trap handlers. If mscratch != 0, the global trap handler simply jumps to the address contained in this CSR. This is a minor architectural change and mirrors the semantics of our current approach, but has some advantages:

straightforward and top-down context-switch assembly, with everything except 2. in one block:
1. prepare the CPU for context switch by
  a. storing clobbered registers on the stack,
  b. storing the stack pointer in a static variable,
  b. disabling interrupts,
  d. registering the app trap handler in mscratch
  f. switching to userspace, re-enabling interrupts
2. on trap,
  a. swap mscratch and t0,
  b. check if t0 == 0, in that case execute the kernel trap handler,
  c. if not, jump to the address in t0 (_start_app_trap) entry.
3. _start_app_trap stores the CPU state and register file in the Riscv32iStoredState and stack layout defined in the very same file / assembly block.
  a. if this trap was caused by an interrupt, disable it.
4. return from the trap handler, by loading the _return_to_kernel symbol directly below via an immediate, and jumping to it via mret
5. restoring clobbered registers
no hard-coded assumptions about process semantics, stored state, stack layout, etc.
a couple less memory loads / stored, replaced through immediate loads (la, expanding to auipc or lui and addi)
potential for more optimizations: the linear execution through the assembly allows more efficient register clobbering / reuse

Drawbacks:

we can no longer keep the kernel stack pointer in mscratch and need to save it to a static variable. This does not change the net load/stores requires, at previously we'd store and load the address for continuing the switch_to_process assembly after the trap handler on the stack.

In short, this change increases cohesion and reduces coupling in the rv32i crate, and allows foreign process implementations.

Testing Strategy

This pull request was tested by CI and needs some careful reviews.

This change-set modifies more parts about the switch_to_process assembly than I'd initially hoped for. For instance, it changes the registers that some assembly arguments are loaded in, to reduce the number of clobbers and/or moves that we have. I tried picking these changes apart into multiple, individually working commits, but that turned out to be very tricky.

I performed an initial, very unscientific analysis of how the old assembly compares to the new one in terms of instruction count. This is going through an end-to-end context switch, starting and ending with the inline assembly in switch_to_process, excluding all user-mode instructions:

new:
- Integer / CSR / always skipped branch instructions: 26
- Loads / Stores: 79
- Branching behavior:
  1. Jump to userspace mepc as part of mret
  2. Jump to global trap handler on trap
  3. Jump to _start_app_trap handler
  4. Either branch if trap not caused by interrupt, or jump to interrupt disable routine
  5. Jump back to switch_to_process when exiting trap handler with mret
current:
- Integer / CSR / always skipped branch instructions: 25
- Loads / Stores: 81
- Branching behavior:
  1. Jump to userspace mepc as part of mret
  2. Jump to global trap handler on trap
  3. Branch to app-part of global trap handler
  4. Either branch if trap not caused by interrupt, or jump to interrupt disable routine
  5. Jump back to switch_to_process when exiting trap handler with mret

TODO or Help Wanted

One issue on RISC-V is the presence of trap & interrupt handlers for different modes. A RISC-V platform with supervisor- and/or user-mode will feature the stvec and utvec CSRs. These will not be consulted for disabled interrupts / trap routing, however, and many more of Tock's assumptions, at least around the SysCall implementation in rv32i, would break if these were used by enabling interrupts in the sie / uie CSRs.

Because we have unclear semantics around them, I documented that right now, we expect all traps to be routed to the global trap handler. I removed the PermissionMode argument for rv32i::configure_trap_handler, as only the Machine branch was ever used.

Looking forward to other's opinions on this!

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make prepush.

bradjc

Looks good, just comments.

I'm not fond of comments before lines of assembly, as there is no way to mark which assembly instructions pertain to that comment. Commenting on the same line removes that potential confusion.

bradjc · 2024-02-27T23:47:40Z

arch/rv32i/src/lib.rs

+            csrrw t0, mscratch, t0
+
+            // If mscratch contained 0, invoke the kernel trap handler.
+            beq   t0, x0, _start_kernel_trap


Suggested change

beq t0, x0, _start_kernel_trap

beqz t0, _start_kernel_trap // If t0 == 0 then we faulted in kernel.

bradjc · 2024-02-27T23:47:58Z

arch/rv32i/src/lib.rs

-            // trap into machine mode. Therefore, this can only happen
-            // when causing an exception in the trap handler itself.
+            // No registers other than t0 and the mscratch CSR are to be
+            // clobberd before contiuing execution at the address loaded into


Suggested change

// clobberd before contiuing execution at the address loaded into

// clobbered before continuing execution at the address loaded into

bradjc · 2024-02-27T23:49:34Z

arch/rv32i/src/lib.rs

+            // stated above.
+
+            // Atomically swap t0 and mscratch:
+            csrrw t0, mscratch, t0


Suggested change

csrrw t0, mscratch, t0

csrrw t0, mscratch, t0 // mscratch=t0; t0=mscratch

bradjc · 2024-02-27T23:51:11Z

arch/rv32i/src/lib.rs


+            // Else, invoke the trap handler at the address that was loaded in
+            // the mscratch CSR.
+            jr    t0


Suggested change

jr t0

jr t0 // Jump to address in t0.

bradjc · 2024-02-27T23:52:14Z

arch/rv32i/src/lib.rs

-            // tracking that the kernel is executing).
-            csrrw t0, 0x340, zero // t0=mscratch, mscratch=0
+            // Restore t0. We reset mscratch to 0 (kernel trap handler mode)
+            csrrw t0, mscratch, 0


Suggested change

csrrw t0, mscratch, 0

csrrw t0, mscratch, 0 // t0=mscratch, mscratch=0

bradjc · 2024-02-28T00:06:24Z

arch/rv32i/src/syscall.rs

+          // to use a temporary register to get that address. So, we save `a1`
+          // to the kernel stack (in t0) before we can move it to the proper
+          // spot in the per-process stored state.
+          sw   a1, 0*4(t0)


Suggested change

sw a1, 0*4(t0)

sw a1, 0*4(t0) // Save a1 on kernel stack.

bradjc · 2024-02-28T00:09:43Z

arch/rv32i/src/syscall.rs

+          //
+          // Save the PC to the stored state struct
+          csrr  t1, mepc
+          sw    t1, 31*4(a1)


Suggested change

sw t1, 31*4(a1)

sw t1, 31*4(a1) // Save the PC to the stored state struct

bradjc · 2024-02-28T00:10:02Z

arch/rv32i/src/syscall.rs

+          //
+          // Save mtval to the stored state struct
+          csrr  t1, mtval
+          sw    t1, 33*4(a1)


Suggested change

sw t1, 33*4(a1)

sw t1, 33*4(a1) // Save mtval to the stored state struct

bradjc · 2024-02-28T00:10:20Z

arch/rv32i/src/syscall.rs

+          // Save mcause and leave it loaded into a0, as we call a function
+          // with it below:
+          csrr  a0, mcause
+          sw    a0, 32*4(a1)


Suggested change

sw a0, 32*4(a1)

sw a0, 32*4(a1) // Save mcause to the stored state struct

bradjc · 2024-02-28T00:11:58Z

arch/rv32i/src/syscall.rs

+          // trap handler so that it does not fire again. If mcause is greater
+          // than or equal to zero this was not an interrupt (i.e. the most
+          // significant bit is not 1).
+          bge  a0, zero, 200f


Suggested change

bge a0, zero, 200f

bgez a0, 200f // if a0>0 branch to _start_app_trap_continue

alevy

Looks good other than a lingering TODO comment

alevy · 2024-02-28T04:03:53Z

arch/rv32i/src/lib.rs

-    /// need to. If the trap happens while and application was executing, we have to
-    /// save the application state and then resume the `switch_to()` function to
-    /// correctly return back to the kernel.
+    /// TODO


What is the TODO here? Is this a comment to yourself to write documentation before converting from a draft to non-draft PR?

lschuermann · 2024-02-28T14:11:59Z

I appreciate the reviews and agree with @bradjc's suggestions around ASM syntax. The reason this is draft still is that this new version is mostly untested yet (and per LiteX CI seems to be broken currently). The current state of this (why its open now) is because I'm convinced what we want to achieve here is possible through the general approach proposed here, and as a forcing function for me to test and carefully review it again, especially a couple days after I wrote the initial version.

alevy · 2024-02-28T19:21:29Z

K, @lschuermann I'm traiging this as waiting-for-author. If you need/want help thinking through this or testing, you know where to find me.

lschuermann · 2024-05-28T22:29:02Z

Superseded by #4009

github-actions bot added risc-v RISC-V architecture WG-OpenTitan In the purview of the OpenTitan working group. labels Feb 17, 2024

lschuermann mentioned this pull request Feb 17, 2024

arch/rv32i: separate kernel and app trap handlers to simply assembly flow and allow foreign process implementations #3847

Closed

2 tasks

arch/rv32i: separate kernel and app trap handlers

93f7e72

lschuermann force-pushed the dev/riscv-alt-process-impl-3 branch from f86592b to 93f7e72 Compare February 17, 2024 23:04

lschuermann marked this pull request as draft February 18, 2024 02:06

github-actions bot assigned alevy Feb 26, 2024

bradjc reviewed Feb 28, 2024

View reviewed changes

alevy requested changes Feb 28, 2024

View reviewed changes

alevy added the waiting-on-author label Feb 28, 2024

lschuermann mentioned this pull request May 28, 2024

arch/rv32i: separate kernel and app trap handlers #4009

Open

2 tasks

lschuermann closed this May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864

arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864

lschuermann commented Feb 17, 2024

bradjc left a comment

bradjc Feb 27, 2024

bradjc Feb 27, 2024

bradjc Feb 27, 2024

bradjc Feb 27, 2024

bradjc Feb 27, 2024

bradjc Feb 28, 2024

bradjc Feb 28, 2024

bradjc Feb 28, 2024

bradjc Feb 28, 2024

bradjc Feb 28, 2024

alevy left a comment

alevy Feb 28, 2024

lschuermann commented Feb 28, 2024 •

edited

alevy commented Feb 28, 2024

lschuermann commented May 28, 2024

	beq t0, x0, _start_kernel_trap
	beqz t0, _start_kernel_trap // If t0 == 0 then we faulted in kernel.

	// clobberd before contiuing execution at the address loaded into
	// clobbered before continuing execution at the address loaded into

	csrrw t0, mscratch, t0
	csrrw t0, mscratch, t0 // mscratch=t0; t0=mscratch

	csrrw t0, mscratch, 0
	csrrw t0, mscratch, 0 // t0=mscratch, mscratch=0

	sw t1, 31*4(a1)
	sw t1, 31*4(a1) // Save the PC to the stored state struct

	sw t1, 33*4(a1)
	sw t1, 33*4(a1) // Save mtval to the stored state struct

	sw a0, 32*4(a1)
	sw a0, 32*4(a1) // Save mcause to the stored state struct

	bge a0, zero, 200f
	bgez a0, 200f // if a0>0 branch to _start_app_trap_continue

arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864

arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864

Conversation

lschuermann commented Feb 17, 2024

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

bradjc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alevy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lschuermann commented Feb 28, 2024 • edited

alevy commented Feb 28, 2024

lschuermann commented May 28, 2024

lschuermann commented Feb 28, 2024 •

edited