Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FSPXI:ReadFileSHA256 error when reading exactly one block from NoCrypto gamecard NCCH ExeFS files #1829

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Popax21
Copy link

@Popax21 Popax21 commented Jan 11, 2023

Process9's implementation of FSPXI:ReadFileSHA256 uses an auxiliary buffer object when exactly one block of data is being read, which reads the data and hashes it at the same time. This object's vtable[11] / Write implementation is stubbed (returning error 0xe0c046f8), but can still end up being called when reading from a gamecard NCCH ExeFS file with the NoCrypto flag set.
This patch addresses this by replacing the stubbed implementation with a custom working one.

Solves #1827

More details

I've split the exact details & patch details into multiple sections, as it is a quite involved issue involving a lot of specific details and interactions between different parts of the code (in fact it took me about a week of reverse engineering process9 to come up with this patch)

NOTE: All process9 offsets / addresses are for the latest FW version at the time of writing (11.16.0-49). All names are my own, and might not line up with existing reverse engineering efforts.

Background / DataBuffers

Process9's code frequently utilizes a data structure I named DataBuffer. This structure is an interface consisting of just a vtable, and is used to represent an abstract store of data which can be read from / written to. Different derived types are used to e.g. implement byte array buffers (vtable at 0x08090d88), buffers encapsulating type 2 IPC data buffers received from IPC requests (vtable at 0x08091180), as well as many others.

There are two primary ways to interacting with DataBuffer instances:

  • the Read / Write functions, which take a pointer to a buffer, an offset into the DataBuffer and a size. If a buffer is e.g. read-only, its Write function would return an error.
  • DataBufferChains, which manage a list of DataBuffers. When DataBufferChain::ProcessBytes is called, it creates a set of queues of physical memory segments. It then employs process9's own threading system to run each DataBuffers ProcessDataQueue method in a separate thread, and enqueues a constant supply of empty memory segments into the first buffer's queue. Implementations of this method will then continuously dequeue memory segments, process the data in these segments, then enqueue them into the next buffer's queue once they've finished doing so. Alternatively, DataBuffer::ProcessDataSegment can be implemented instead, which will process a single segment at a time - in this case ProcessDataQueue simply consists of a call to a helper method which performs the queue management and passes the dequeued segments to this method.

Generally, DataBufferChains seem to be the more "optimal" method - most implementations will implement ProcessDataQueue/ProcessDataSegment using DMA if possible.

The relevant DataBuffer vtable method signatures are:

  • vtable[0]: int ProcessDataSegment(this, uint off, DataBufferChain::DataSegment* seg)
  • vtable[4]: int ProcessDataQueue(this, uint off, uint size, DataBufferChain::DataQueue* queue)
  • vtable[8]: int Read(this, uint off, void* data, uint size)
  • vtable[11]: int Write(this, uint off, void* data, uint size)
  • vtable[12]: void AddToChain(this, DataBufferChain* chain, uint off)

The issue

FSPXI:ReadFileSHA256's implementation (0x08074738) also utilizes DataBuffers. It checks if the number of bytes to read is exactly one block, and if that's the case, sets up an instance of an auxiliary DataBuffer subtype (let's call it FaultyBuf) on the stack. It then simply forwards to FSPXI:ReadFile, which reads its data into this buffer. FaultyBuf not only writes the data into the IPC response buffer, but also hashes it along the way (presumably this edge cast exists to increase performance). Once FSPXI:ReadFile returns, the method then proceeds to compare the resulting hash with the one from the hash table, and if they match, returns successfully.

However, while FaultyBuf implements ProcessDataQueue, it does not implement Write, which is instead stubbed to return error 0xe0c046f8. To see why this is an issue, one needs to look at the method which eventually handles the read request, NCCH::ReadEncrypted (0x0805c1f4):

if(key->crypto_type == 0) { //NoCrypto
    return this->ReadBytes(buf, ...);
} else {
    ...
    return this->ReadThroughNCCHBuf(..., buf, ...);
}

NCCH::ReadBytes (0x0805e1f0) eventually ends up calling DataBuffer::Write on the buffer it was given (at least for gamecards, its behavior might differ for installed / other titles). This results in an error when the NoCrypto flag is set, as in this case the FaultyBuf instance is passed through directly, which does not implement Write.

NCCH::ReadThroughNCCHBuf (0x080334a0) on the other hand passes its own DataBuffer implementation to ReadBytes instead (which I named NCCHEncryptedDataBuffer, vtable at 0x08092620), which does implement Write. After it decrypted the raw data, it then accesses the original data buffer through the DataBufferChain mechanism, which is properly implemented by FaultyBuf.

The amount of specific details / requirements involved in this issue lead me to believe that this is not an intentional security check, but a bug in Nintendo's firmware.

The patch

The correct way to fix this issue can be found in NCCHEncryptedDataBuffer's Write implementation (0x08070610):

if ((this->force_blocks != 0) && (((off & 0x1ff) != 0 || (size != 0x200)))) {
    panic();
}

DataChainProcessor proc;
proc.Init(data, size);
ret proc.ProcessBytes(this, off, 0, size);

NCCHEncryptedDataBuffer only implements its logic in ProcessDataQueue/ProcessDataSegment, same as FaultyBuf. However, in its Write implementation, it uses a helper class called DataChainProcessor to complete the request instead of returning an error code. The helper class sets up a temporary DataBufferChain, which is then used to transfer the source buffer's contents into the given DataBuffer. As such, this issue can be fixed by replicating that same behavior for FaultyBuf::Write as well.

However, it's not possible to just redirect FaultyBuf::Write to NCCHEncryptedDataBuffer::Write, as their memory layouts aren't compatible (in particular, the first if check will read garbage). Instead, it's necessary to craft a custom implementation of this method in either free space or ITCM, and patch the vtable to point to this method instead (this PR places the function in ITCM as I could not find any other suitable spots).

The patched method needs to reference three other functions: DataChainProcessor::DataChainProcessor, DataChainProcesssor::Init and DataChainProcessor::ProcessBytes. The location of all three methods can be determined by iterating over all functions called by NCCHEncryptedDataBuffer::Write (by decoding all bl instructions in the method), and checking the first 4 bytes of the method's code as a sort of "method signature" to distinguish between the different methods (this approach is implemented in this PR).

Results

The following screenshot was taken on a version of Luma3DS with this patch applied, by inserting a Sky3DS+ flashcart loaded with a CCI created from FBI.cia using makerom (note that it might be required to fixup the ROM's title key using a script like this, or the flashcart might label the ROM as bad):

top
bot

Without this PR, the 3DS wouldn't pick up the cartridge at all, as the home menu's call to FSUSER:ReadFileDirectly to read the cartridge's icon file would have failed.

@TuxSH
Copy link
Member

TuxSH commented Jan 15, 2023

Hi, and thank you for taking your time to look at this. However, I'm not sure I want to merge this (I plan to reimpl p9 eventually), and I see a few issues:

I couldn't reproduce the issue at all. All installed titles I have (system, games and homebrew) and all retail gamecards I have have the NoCrypto flag set. I've tried reproducing the issue like this, and couldn't trigger the issue (unless I put totalSize == blockHashSize and misaligned read buffer), snippet here.

What I believe is happening is that they all have NoCrypto set because they're put on an encrypted partition/folder, and double-encryption would result in an insane loss of performance (you can chain multiple NDMA-capable devices, including SD/MMC, AES and SHA at very little cost). This results in the faulty codepath you keep mentioning never being triggered, because there will always be a layer of AesBuffer in this scenario.

Moreover (not sure though - haven't checked latest p9), there are two objects that you need to patch, not just one (case where everything is aligned and size != blockHashSize).

I think encrypting your NCSD partition(s) would solve this.

@Popax21
Copy link
Author

Popax21 commented Jan 16, 2023

Thanks for the response!

All installed titles I have (system, games and homebrew) and all retail gamecards I have have the NoCrypto flag set.

Installed titles aren't affected by this afaik (probably because they already undergo a round of decryption before reaching the faulty code, like how you described it), but all my retail gamecards (dumped using gm9) do have their NCCH exheader/ExeFS/etc. encrypted. In fact, I created a "Frankenstein Version" of one of them, which are completly identical except one has the encryption removed - this one will not load when put on a Sky3DS+ for testing, even though the original one did (if you want to, I can send you both ROM files or write a script which does this for abritrary files).

unless I put totalSize == blockHashSize and misaligned read buffer

The first part of that is what I've been refering to as "exactly one block of data", sorry if that was unclear. The misaligned read buffer might be needed as well (there was some code at the start of ReadFileSHA256 which seems to have checked for some sort of alignment, I'll check that out shortly), but I don't think so, as the original faulty read was a read of 0x200 bytes from offset 0 of the icon file (if I remember correctly).

What I believe is happening is that they all have NoCrypto set because they're put on an encrypted partition/folder

From what I know of the 3DS, while communication with the game card itself is encrypted after initialization, the actual data read from it isn't - it's a CCI where the NCCH files might use exheader/ExeFS encryption (which is where the issue is).

you can chain multiple NDMA-capable devices, including SD/MMC, AES and SHA at very little cost

This is what I believe to be the deal with this special case in the first place - when it can, Process9 tries to directly DMA to the user IPC buffer at the same time as to the SHA engine, to skip the additional layer of buffering which would otherwise be required (and in fact is done by the other code path).

there are two objects that you need to patch, not just one

The second code path "totalSize != blockHashSize" seems to be not affected, as it seperates out the reading and hashing of data into two separate steps (the buffers used here are standard buffer implementations from what I remember).

I think encrypting your NCSD partition(s) would solve this

I didn't know you could encrypt NCSD partitions themselves (maybe I am misunderstanding something though). Even then, all retail 3DS ROMs I tested don't seem to utilize this encryption. In case you're talking about the NCCH encryption though, in this case one must use the fixed developer key method, which would fix it, but given how this isn't required anywhere else I still decided to write this patch instead of going for the "workaround solution".

@Popax21
Copy link
Author

Popax21 commented Jan 16, 2023

UPDATE: (put this into a separate message for better visibility)

I just checked out the code again, and you're right - there is a second object which needs patching (it's in a code path for when offset + size is block_size aligned, and the read buffer address is 4 byte aligned) - this should be easy though, as the same patched method can be used for both, so only the vtable address needs to be replaced. I couldn't confirm if the earlier object requires the offset to be aligned though as well - it's well possible that the code in front of the total_size == block_size check does check this, however Ghidra has trouble decompiling that part, so I can't tell for certain.

Additionally, I've dug through some of my old files, and found a dump of the faulty PXI IPC communication - this might help you reproduce it (the file handle was obtained from FSUSER:OpenFileDirectly with archive ID 0x2345678A, name icon):

---------- PXI ERROR: 2 004d01c4 -> e0c046f8 ---------
IN:
0x106f40:	0x004d01c4	0x00000014	0x0000002b	0x00000000
0x106f50:	0x00000000	0x000036c0	0x000036c0	0x00000020
0x106f60:	0x00002006	0x2ffb2000	0x0036c014	0x2ffb3000
0x106f70:	0x2ff9b000	0x00000000	0x00000000	0x00000000

OUT:
0x11d260:	0x004d0081	0xe0c046f8	0xff91f000	0x00000004
0x11d270:	0x00000000	0x000036c0	0x000036c0	0x00000020
0x11d280:	0x00002006	0x2ffb2000	0x0036c014	0x2ffb3000
0x11d290:	0x2ff9b000	0x00000000	0x00000000	0x00000000

(to my knowledge, this should correspond to the following arguments: file handle 0x0000002b00000014, offset 0x0000000000000000, read buffer size 0x000036c0, block size 0x000036c0, size of hashtable 0x00000020)

@TuxSH
Copy link
Member

TuxSH commented Feb 28, 2023

Hey, sorry for the very late answer.

It looks like you're right and that I was fooled by gm9.

Also, ignore my comments about hw crypto acceleration: it happens at the partition (block) level, which doesn't apply to the SD card. This means that for installed games, it does: sd -> FCRAM -> aes -> FCRAM -> aes -> FCRAM which is extremely slow (and although I need to benchmark it properly, explains why the loading times of games are so bad).

I'll probably leave this PR open and as it is useful for people who want to mess with NAND titles as well.

(for SD games, the error/bug is avoided because of the layer of crypto the SD folder contents have)

@Popax21
Copy link
Author

Popax21 commented Feb 28, 2023

Hey, sorry for the very late answer.

It looks like you're right and that I was fooled by gm9.

Also, ignore my comments about hw crypto acceleration: it happens at the partition (block) level, which doesn't apply to the SD card. This means that for installed games, it does: sd -> FCRAM -> aes -> FCRAM -> aes -> FCRAM which is extremely slow (and although I need to benchmark it properly, explains why the loading times of games are so bad).

I'll probably leave this PR open and as it is useful for people who want to mess with NAND titles as well.

(for SD games, the error/bug is avoided because of the layer of crypto the SD folder contents have)

OK, thanks for the response - no need to worry about the delay by the way.
I've merged the upstream branch back into mine so that it's up to date again, in case anyone else wants to use it as well / you want to merge it as a temporary fix before the pm9 reimplementation you mentioned.

…Crypto NCCH ExeFS files

Process9's implementation of FSPXI:ReadFileSHA256 uses an auxiliary
buffer object when exactly one block of data is being read, which reads
the data and hashes it at the same time. This object's vtable[11] is
stubbed (returning error 0xe0c046f8), but can still end up being called
when reading from an NCCH ExeFS file with the NoCrypto flag set.

This patch addresses this by replacing the stubbed implementation with
a custom working one (see issue LumaTeam#1827 + related PR for more details)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants