Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864

Closed

Conversation

lschuermann
Copy link
Member

Pull Request Overview

This is a second, slightly less elegant but working(?) stab at #3847.

Tock's core kernel design permits process implementations with system call interfaces other than the ones shipped upstream in the arch/cortex-m and arch/rv32i crates. However, in practice, the UserspaceKernelBoundary implementation for RV32I has always been tightly coupled to the generic kernel RISC-V trap handler. This has made it difficult to build an alternative process implementation (e.g., as part of Encapsulated Functions), while reusing much of Tock's RISC-V implementation.

By defining an "interface" that the global trap handler exposes (i.e., specifying how a custom trap handler can be registered, and which registers get clobbered), we allow foreign process implementations to hook into the rv32i arch crate's assembly, and can move all process-specific assembly into SysCall::switch_to_process. All process-specific logic is now contained in one assembly block that can be read top-to-bottom. The global trap handler no longer relies on the stack layout of that function, or other data structures in syscall.rs.

Because the kernel trap handler no longer contains any application logic and expects the app to set its own trap handler, implementing a new syscall ABI becomes as simple as temporarily swapping out the trap handler.

In-depth Walkthrough

Currently context switches on RISC-V work as follows:

  1. For traps arriving in kernel mode, the kernel trap handler does an intricate dance to compare the mscratch value to 0 without clobbering any registers, and then takes a _from_kernel branch in _start_trap.

  2. When switching to an app we disable interrupts and then swap the current kernel stack pointer into mscratch, making it non-zero. We re-enable interrupts when switching into user-mode.

  3. When a trap arrives during app execution we perform the same dance as in 0., but then branch into _from_app.

  4. The _from_app branch saves all of the application state through two levels of pointer indirection:

    • reading the stack pointer from mscratch
    • dereferencing a pointer on said stack to a struct compatible in layout to the Riscv32iStoredState

    We then proceed to dump all registers into this state struct, and other core-local state (CSRs) onto known offsets from the stack pointer

    This hard-codes many assumptions about Tock's process model, and may preclude implementations that want additional state stored, or do not want to buy into storing the whole register file (e.g., for single function invocations as with Encapsulated Functions, or cooperative userspace apps).

  5. _from_app proceeds to switch to Machine mode.

    This may be undesirable for process models which defer interrupt handling, e.g., for cooperative apps.

  6. _from_app returns from the interrupt handler to an address stored at a well-known offset on the stack.

As is obvious from the above, using this trap handler has a significant amount of "buy in" concerning both the process execution semantics and the layout of stack pointer and the necessary Riscv32iStoredState.

I propose changing the routing of traps caught while executing processes by using mscratch not to branch between "kernel" and "app" trap handlers, but between "kernel" and "custom" trap handlers. If mscratch != 0, the global trap handler simply jumps to the address contained in this CSR. This is a minor architectural change and mirrors the semantics of our current approach, but has some advantages:

  • straightforward and top-down context-switch assembly, with everything except 2. in one block:
    1. prepare the CPU for context switch by
      a. storing clobbered registers on the stack,
      b. storing the stack pointer in a static variable,
      b. disabling interrupts,
      d. registering the app trap handler in mscratch
      f. switching to userspace, re-enabling interrupts
    2. on trap,
      a. swap mscratch and t0,
      b. check if t0 == 0, in that case execute the kernel trap handler,
      c. if not, jump to the address in t0 (_start_app_trap) entry.
    3. _start_app_trap stores the CPU state and register file in the Riscv32iStoredState and stack layout defined in the very same file / assembly block.
      a. if this trap was caused by an interrupt, disable it.
    4. return from the trap handler, by loading the _return_to_kernel symbol directly below via an immediate, and jumping to it via mret
    5. restoring clobbered registers
  • no hard-coded assumptions about process semantics, stored state, stack layout, etc.
  • a couple less memory loads / stored, replaced through immediate loads (la, expanding to auipc or lui and addi)
  • potential for more optimizations: the linear execution through the assembly allows more efficient register clobbering / reuse

Drawbacks:

  • we can no longer keep the kernel stack pointer in mscratch and need to save it to a static variable. This does not change the net load/stores requires, at previously we'd store and load the address for continuing the switch_to_process assembly after the trap handler on the stack.

In short, this change increases cohesion and reduces coupling in the rv32i crate, and allows foreign process implementations.

Testing Strategy

This pull request was tested by CI and needs some careful reviews.

This change-set modifies more parts about the switch_to_process assembly than I'd initially hoped for. For instance, it changes the registers that some assembly arguments are loaded in, to reduce the number of clobbers and/or moves that we have. I tried picking these changes apart into multiple, individually working commits, but that turned out to be very tricky.

I performed an initial, very unscientific analysis of how the old assembly compares to the new one in terms of instruction count. This is going through an end-to-end context switch, starting and ending with the inline assembly in switch_to_process, excluding all user-mode instructions:

  • new:
    • Integer / CSR / always skipped branch instructions: 26
    • Loads / Stores: 79
    • Branching behavior:
      1. Jump to userspace mepc as part of mret
      2. Jump to global trap handler on trap
      3. Jump to _start_app_trap handler
      4. Either branch if trap not caused by interrupt, or jump to interrupt disable routine
      5. Jump back to switch_to_process when exiting trap handler with mret
  • current:
    • Integer / CSR / always skipped branch instructions: 25
    • Loads / Stores: 81
    • Branching behavior:
      1. Jump to userspace mepc as part of mret
      2. Jump to global trap handler on trap
      3. Branch to app-part of global trap handler
      4. Either branch if trap not caused by interrupt, or jump to interrupt disable routine
      5. Jump back to switch_to_process when exiting trap handler with mret

TODO or Help Wanted

One issue on RISC-V is the presence of trap & interrupt handlers for different modes. A RISC-V platform with supervisor- and/or user-mode will feature the stvec and utvec CSRs. These will not be consulted for disabled interrupts / trap routing, however, and many more of Tock's assumptions, at least around the SysCall implementation in rv32i, would break if these were used by enabling interrupts in the sie / uie CSRs.

Because we have unclear semantics around them, I documented that right now, we expect all traps to be routed to the global trap handler. I removed the PermissionMode argument for rv32i::configure_trap_handler, as only the Machine branch was ever used.

Looking forward to other's opinions on this!

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

@lschuermann lschuermann marked this pull request as draft February 18, 2024 02:06
Copy link
Contributor

@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just comments.

I'm not fond of comments before lines of assembly, as there is no way to mark which assembly instructions pertain to that comment. Commenting on the same line removes that potential confusion.

csrrw t0, mscratch, t0

// If mscratch contained 0, invoke the kernel trap handler.
beq t0, x0, _start_kernel_trap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
beq t0, x0, _start_kernel_trap
beqz t0, _start_kernel_trap // If t0 == 0 then we faulted in kernel.

// trap into machine mode. Therefore, this can only happen
// when causing an exception in the trap handler itself.
// No registers other than t0 and the mscratch CSR are to be
// clobberd before contiuing execution at the address loaded into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// clobberd before contiuing execution at the address loaded into
// clobbered before continuing execution at the address loaded into

// stated above.

// Atomically swap t0 and mscratch:
csrrw t0, mscratch, t0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
csrrw t0, mscratch, t0
csrrw t0, mscratch, t0 // mscratch=t0; t0=mscratch


// Else, invoke the trap handler at the address that was loaded in
// the mscratch CSR.
jr t0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jr t0
jr t0 // Jump to address in t0.

// tracking that the kernel is executing).
csrrw t0, 0x340, zero // t0=mscratch, mscratch=0
// Restore t0. We reset mscratch to 0 (kernel trap handler mode)
csrrw t0, mscratch, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
csrrw t0, mscratch, 0
csrrw t0, mscratch, 0 // t0=mscratch, mscratch=0

// to use a temporary register to get that address. So, we save `a1`
// to the kernel stack (in t0) before we can move it to the proper
// spot in the per-process stored state.
sw a1, 0*4(t0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sw a1, 0*4(t0)
sw a1, 0*4(t0) // Save a1 on kernel stack.

//
// Save the PC to the stored state struct
csrr t1, mepc
sw t1, 31*4(a1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sw t1, 31*4(a1)
sw t1, 31*4(a1) // Save the PC to the stored state struct

//
// Save mtval to the stored state struct
csrr t1, mtval
sw t1, 33*4(a1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sw t1, 33*4(a1)
sw t1, 33*4(a1) // Save mtval to the stored state struct

// Save mcause and leave it loaded into a0, as we call a function
// with it below:
csrr a0, mcause
sw a0, 32*4(a1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sw a0, 32*4(a1)
sw a0, 32*4(a1) // Save mcause to the stored state struct

// trap handler so that it does not fire again. If mcause is greater
// than or equal to zero this was not an interrupt (i.e. the most
// significant bit is not 1).
bge a0, zero, 200f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bge a0, zero, 200f
bgez a0, 200f // if a0>0 branch to _start_app_trap_continue

Copy link
Member

@alevy alevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than a lingering TODO comment

/// need to. If the trap happens while and application was executing, we have to
/// save the application state and then resume the `switch_to()` function to
/// correctly return back to the kernel.
/// TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the TODO here? Is this a comment to yourself to write documentation before converting from a draft to non-draft PR?

@lschuermann
Copy link
Member Author

lschuermann commented Feb 28, 2024

I appreciate the reviews and agree with @bradjc's suggestions around ASM syntax. The reason this is draft still is that this new version is mostly untested yet (and per LiteX CI seems to be broken currently). The current state of this (why its open now) is because I'm convinced what we want to achieve here is possible through the general approach proposed here, and as a forcing function for me to test and carefully review it again, especially a couple days after I wrote the initial version.

@alevy
Copy link
Member

alevy commented Feb 28, 2024

K, @lschuermann I'm traiging this as waiting-for-author. If you need/want help thinking through this or testing, you know where to find me.

@lschuermann
Copy link
Member Author

Superseded by #4009

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
risc-v RISC-V architecture waiting-on-author WG-OpenTitan In the purview of the OpenTitan working group.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants