You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Presently sending unique runtime args per-core/processor requires the dispatcher to iterate over cores/processors resulting in 64 cores * 3 processors = 192 noc transactions. This dispatch is riscv limited not data limited and preforms below expectations. To fix this:
Reduce the max number of unique args from 256 to…32?
Pack args on a row. That is, processors on a row will receive all the unique runtime args for all processors on that row. This should give a ~8x gain. For 32 unique args, we'd send 8324=1K bytes which is likely still riscv dominated (and takes just 32 noc cycles)
2a) We can experiment w/ packing just by 4x (half row) by masking off the upper bit of noc_x to reduce data replication. This could be done dynamically based on the number of RTAs
Use a dynamic memory map for RTAs (can do this now before the ring buffer). This allows packing B/N/T args tightly so they are sent together as one NOC transaction which is another ~3x gain for ~24x
To do this we need to:
Move RTAs to a dynamic memory map w/ the origin of the args specified in the launch message
Change RTA indexing in the kernel (get_runtime_arg) to use noc X index to retrieve the right arg
Change host side fast dispatch to pack all this appropriately. Device side has the capabilities to handle this already
The text was updated successfully, but these errors were encountered:
Presently sending unique runtime args per-core/processor requires the dispatcher to iterate over cores/processors resulting in 64 cores * 3 processors = 192 noc transactions. This dispatch is riscv limited not data limited and preforms below expectations. To fix this:
2a) We can experiment w/ packing just by 4x (half row) by masking off the upper bit of noc_x to reduce data replication. This could be done dynamically based on the number of RTAs
To do this we need to:
The text was updated successfully, but these errors were encountered: