Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to Generate Bitmasks for Specific Instruction Components? #2338

Open
pinwhell opened this issue Apr 26, 2024 · 3 comments
Open

Ability to Generate Bitmasks for Specific Instruction Components? #2338

pinwhell opened this issue Apr 26, 2024 · 3 comments

Comments

@pinwhell
Copy link

Hey Capstone Team,

Hope you're all doing well! I've been digging into Capstone for a while now, and it's been a great tool for dissecting machine instructions. But there's one thing that keeps popping up in my workflow: the need to generate bitmasks for specific instruction components like operands, opcodes, immediates, and registers.

Right now, I'm cobbling together these bitmasks manually based on the instruction formats of different architectures. It works, but it's a bit cumbersome and error-prone.

So, here's my thought: would it be possible to build a feature into Capstone that could generate these bitmasks automatically? It would be a game-changer for me and, I imagine, for others too.

Imagine being able to just call a function in the Capstone API and get a bitmask that perfectly aligns with the structure of the instruction. It would make tasks like dissecting instructions at the binary level or building custom analysis tools so much easier and more accurate.

For example, I could use these bitmasks to quickly extract and analyze specific components of an instruction, which would be incredibly useful for reverse engineering and vulnerability analysis projects.

I really think this feature would add a lot of value to Capstone and make it even more indispensable for low-level analysis tasks. So, if it's something you guys could consider for a future release, I'd be thrilled!

Thanks for taking the time to read my suggestion. If you need any more details or have questions, feel free to reach out.

Cheers...

@Rot127
Copy link
Collaborator

Rot127 commented Apr 26, 2024

Yes, we already considered that, but unfortunately it turned out, that it isn't possible to properly implement this. The Instruction definitions in LLVM (from which we generate our code), are not detailed enough. So we cannot, for example in ARM, distinguish between bits of the memory base register and the memory disponent. Because LLVM doesn't care to make a distinction on the definition layer (in the TableGen files).

There was an attempt to implement it for ARM (#2045). But, unfortunately, I could not merge it with good consciousness (see the justification for more details: #2045 (comment)).

LLVM architectures are just not defined well enough. Most of them are hand-written and contain errors or don't align with the ISA. Hexagon is the one exception I am aware of, which is maintained by Qualcomm and hence very correct. But this is the exception. And if we want to add such a feature, it must be easy to generate for all architectures. And LLVM information is simply not good enough for this.

We look into adding a module generator which consumes SAIL definitions. They seem more precise. Once we have this, we can try and add this feature again.

@Rot127
Copy link
Collaborator

Rot127 commented Apr 26, 2024

Also see:
#2031
#1970

@pinwhell
Copy link
Author

100% Understood those challenges and limitations we are currently facing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants