Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial auto-sync LoongArch support #2349

Open
wants to merge 1 commit into
base: next
Choose a base branch
from

Conversation

jiegec
Copy link
Contributor

@jiegec jiegec commented May 3, 2024

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

Add loongarch support to auto-sync. Co-authored by @FurryAcetylCoA.

See capstone-engine/llvm-capstone#47 for llvm changes. The generated code have small changes after generated via:

./src/autosync/ASUpdater.py -a LoongArch -s IncGen Translate
  • MC Tests are generated from llvm
  • Instruction groups are implemented
  • Register accesses are implemented
  • Memory operands are handled for memory instructions
  • Code are formatted using clang-format of LLVM 17

Test plan

...

Closing issues

...

@jiegec jiegec marked this pull request as draft May 3, 2024 07:36
@jiegec
Copy link
Contributor Author

jiegec commented May 3, 2024

Not finished yet, but hope to see some suggestions from @Rot127. The pr can already disassemble some instructions:

$ test_loongarch
****************
Platform: loongarch32
Code:0x0c 0x00 0x08 0x14 0x8c 0xfd 0xbf 0x02
Disasm:
0x1000: lu12i.w $t0, 0x4000
        op_count: 2
                operands[0].type: REG = t0
                operands[1].type: IMM = 0x4000

0x1004: addi.w  $t0, $t0, -1
        op_count: 3
                operands[0].type: REG = t0
                operands[1].type: REG = t0
                operands[2].type: IMM = 0xffffffffffffffff

0x1008:

****************
Platform: loongarch64
Code:0x80 0x80 0x00 0x40 0x63 0x80 0xff 0x02 0x78 0x20 0xc0 0x29 0x00 0x84 0x00 0x01 0x00 0xa4 0x14 0x01
Disasm:
0x1000: beqz    $a0, 0x80
        op_count: 2
                operands[0].type: REG = a0
                operands[1].type: IMM = 0x80

0x1004: addi.d  $sp, $sp, -0x20
        op_count: 3
                operands[0].type: REG = sp
                operands[1].type: REG = sp
                operands[2].type: IMM = 0xffffffffffffffe0

0x1008: st.d    $s1, $sp, 8
        op_count: 3
                operands[0].type: REG = (null)
                operands[1].type: REG = sp
                operands[2].type: IMM = 0x8

0x100c: fadd.s  $fa0, $fa0, $fa1
        op_count: 3
                operands[0].type: REG = fa0
                operands[1].type: REG = fa0
                operands[2].type: REG = fa1

0x1010: movgr2fr.w      $fa0, $zero
        op_count: 2
                operands[0].type: REG = fa0
                operands[1].type: REG = zero

0x1014:

The memory operand need to be handled, and more tests are required.

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job! Thanks a lot to you and @FurryAcetylCoA.

An advice if you are uncertain how to do some of the things below. You can look at the ARM or PPC module. Don't check AArch64 (too complex), Alpha, TriCore (does not always follow the auto-sync design for good reasons) or the other modules (they are heavily outdated).

Now, the biggest things missing are:

  • The details should be filled in by an add_cs_detail() function in LoongArchMapping.c (see comments).
    • The access information is still missing. Again, you can take a look at the ARM or PPC module, how it is done there.
    • Instruction groups are still missing as well if I didn't overlooked it.
  • The generated enums in the header file (register and instructions) should not be copied by hand, but via the HeaderPatcher.py. Just place the comments into the header file and run the script on it.
  • Delete unused inc files.
  • Regarding testing:
    • Disassembler tests are taken from llvm/test/MC/ and llvm/test/MC/Disassembler/, mostly copied by hand into the capstone repo (suite/MC/). I write a script currently to extract them automatically. I'll notify you when it's done.
    • Testing the details (luckily not difficult for LoongArch), can be done in your case by adding test cases to suite/cstest/issues.cs. Just ensure that every possible execution path for register, immediate and memory operand details are taken.

Once these things are done we are almost there. But we should do a second round of review then.

suite/auto-sync/src/autosync/cpptranslator/Differ.py Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchDisassembler.c Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchGenCSFeatureEnum.inc Outdated Show resolved Hide resolved
suite/auto-sync/src/autosync/cpptranslator/patches/Size.py Outdated Show resolved Hide resolved
utils.h Outdated Show resolved Hide resolved
@jiegec jiegec force-pushed the next branch 7 times, most recently from c02fcbd to d17db4f Compare May 3, 2024 10:54
@Rot127
Copy link
Collaborator

Rot127 commented May 3, 2024

Regarding the CI:

  • The auto-sync job can fail. Since we have not yet merged the llvm-18 update
  • The clang-tidy errors should be fixed if it complains about changes in the LoongArch files. But if you have the time, it would be of course awesome if you could fix the one or two errors in cs.c as well.

@jiegec jiegec force-pushed the next branch 5 times, most recently from ce9dd99 to 1d37d04 Compare May 3, 2024 14:43
@jiegec
Copy link
Contributor Author

jiegec commented May 3, 2024

Features added:

  • Import tests from LLVM MC
  • Collect operand type and access
  • Collect registers read/modified

However, in LoongArch assembly, memory operands are not special i.e. they are normal register/immediate operands. I am unsure how to create MEM operands.

@Rot127
Copy link
Collaborator

Rot127 commented May 4, 2024

However, in LoongArch assembly, memory operands are not special i.e. they are normal register/immediate operands. I am unsure how to create MEM operands.

Indeed a little tricky. I checked the td files and the ISA quickly and it seems that all LOAD and STOREs are either of the instruction format 3R, 2RI12 or 2RI14.
In the PrinterCapstone you can now emit these formats (and all the others) as additional information. This is also done for PPC (see Mapping.h and the generated values in the PPCGenCSMappingInsn.inc file).

Here is the code where we generate it for PPC. In your case you can do it just like for PPC there. All LoongArch instructions seem to be derived from the class LAInst. The first inheritance child of this class, is the format class. In the PPC case we search for the class I and then emit the first child of it. Which is the format class. You can do it the same way for the LoongArch instructions. Get the LAInst class and emit the name of the first child. Which is the format.
Additionally to this, you should emit if the instruction loads or stores memory. You can check the CGI->mayLoad and CGI->mayStore flags in PrinterCapstone to figure this out.

In the end it should look something like this:

In loongarch.h

typedef struct {
	loongarch_insn_form form;
	loongarch_mem_access maccess; // LOONGARCH_MEM_LOAD, LOONGARCH_MEM_STORE, LOONGARCH_MEM_NONE
} loongarch_suppl_info; // add this to the union in Mapping.h

A generated entry in LoongArchGenCSMappingInsn.inc should look something like this:

{
  /* <mnemonic> */
  LOONGARCH_LOAD.... /* 337 */, LOONGARCH_INS_LOAD...,
  #ifndef CAPSTONE_DIET
    { 0 }, { 0 }, { 0 }, 0, 0, {{ LOONGARCH_INSN_FORM_2RI12, LOONGARCH_MEM_LOAD }}
  #endif
},

Now, when you add the details about the operands in LoongArchMapping.c you can check the instruction format and the memory access flags you just generated. If the instruction loads memory and you know the format, you also know which operand is the base register, disponent, offset register etc.

Would this work?

@jiegec
Copy link
Contributor Author

jiegec commented May 4, 2024

Would this work?

Thanks, I have implemented the logic, hopefully I didn't get corner cases wrong.

Copy link
Contributor

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update also the README.md

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice so far!
I think the only thing left are the details of the memory operands and the instruction groups?

Afterwards we only need to run clang-format and fuzz it.

arch/LoongArch/LoongArchInstPrinter.c Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchInstPrinter.c Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchInstPrinter.c Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchInstPrinter.c Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchInstPrinter.c Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchInstPrinter.h Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchInstPrinter.h Outdated Show resolved Hide resolved
@jiegec
Copy link
Contributor Author

jiegec commented May 12, 2024

Looks very nice so far! I think the only thing left are the details of the memory operands and the instruction groups?

Afterwards we only need to run clang-format and fuzz it.

Thanks! I have added code for memory operands fixup and instruction group detection.

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. It looks really good.

I would propose the following:

  • Add LoongArch to the .github/labeler.yml
  • Please write a summary of the PR into the description. What capabilities the module has (supported operands, ISA version used and a link to it and on what version of LLVM it is on, and whatever details you would like). We use them for the squash merge later.
  • Last actual thing is fuzzing. We need to be sure the code doesn't segfault as it is. Follow point 3 from the comment here: Add HPPA(PA-RISC) architecture #2265 (review)
  • Lastly please run clang-format on all LoongArch files, if you haven't done yet. There is a .clang-format file in the repos root dir you can use.

Then I would consider it done. I will merge the llvm-capstone update after AArch64 is ready. Then we can rebase this one onto the newest next. Afterwards the Auto-Sync CI job should also succeed.

If everything is green, we can merge.

@XVilka @kabeor You might want to review this as well. Since everything is implemented which we need.

@Rot127
Copy link
Collaborator

Rot127 commented May 15, 2024

@jiegec @FuzzySecurity Before I forget. Do you have any feedback working with Auto-Sync? I would appreciate any comments and feedback on it. Especially, what to make better, what was difficult to work with and where to improve.

@Rot127
Copy link
Collaborator

Rot127 commented May 15, 2024

Due to capstone-engine/llvm-capstone@ee2e109 please generate the Disassembler tables again or just add the one line by hand.

@Rot127
Copy link
Collaborator

Rot127 commented May 28, 2024

@jiegec Did you already find time to run the fuzzing? The PR is almost done.

@jiegec
Copy link
Contributor Author

jiegec commented May 29, 2024

@jiegec Did you already find time to run the fuzzing? The PR is almost done.

Thanks, I have run the following fuzzing tests without crashes:

./cstool loongarch32 0 0xffffffff
./cstool -d loongarch32 0 0xffffffff
./cstool loongarch64 0 0xffffffff
./cstool -d loongarch64 0 0xffffffff

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the typo. Besides that, I approve it.
Thank you! Awesome job!

@kabeor Please review. I would finish the AArch64 PR, update llvm-capstone, retrigger the CI here to ensure everything is green and then we can merge this one.

README.md Outdated Show resolved Hide resolved
- Accompanied llvm changes: capstone-engine/llvm-capstone#45
- MC Tests are generated from llvm
- Instruction groups are implemented
- Register accesses are implemented
- Memory operands are handled for memory instructions
- Code are formatted using clang-format of LLVM 17

Co-authored-by: CoA <1109673069@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants