Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler: parallel codegen with MT #14227

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Jan 13, 2024

This implements parallel codegen of object files when MT is enabled in the compiler, which brings some performance improvements.

For example to compile Crystal itself on my laptop with empty caches: the codegen takes ~20s with fork and only ~14s with MT (~35s without this patch). With a filled cache fork takes ~9s and only ~4s with MT (~5s without this patch) 馃殌

The biggest issue is LLVM having thread safety issues. The most prominent is that LLVMContext can't be shared, while the C library creates a global context, and from my tests LLVMTargetMachine can't be shared across threads either. There are no issues with the LLVM optimization pass with LLVM 16 at least; it might be different in LLVM 12 and before that use a different API (and reuses LLVM objects).

The good: I applied the patch on top of Crystal 1.11.1 and I could build and rebuild the compiler in non-release mode, -O1, -O2, -O3 or --single-module.
The bad: Sadly, a compiler built with --single-module -O1 or --release will segfault during codegen (which means other modes could probably also segfault, just not as often) 馃槶

We only use the bitcode for cache purposes, for MT safety we must parse
the bitcode into a _new_ LLVM module in a new LLVM context for each
compilation unit.

We also can't share a LLVM target machine, and must create one for each
compilation unit... but maybe we could share one per thread?
@ysbaddaden
Copy link
Contributor Author

The segfault when the compiler is built in release mode:

Thread 11 "crystal" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe3fff700 (LWP 231004)]
0x00007ffff1f0892f in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
(gdb) bt
#0  0x00007ffff1f0892f in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#1  0x00007ffff1f088d4 in llvm::TargetLoweringObjectFileELF::getExplicitSectionGlobal(llvm::GlobalObject const*, llvm::SectionKind, llvm::TargetMachine const&) const () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#2  0x00007ffff2139248 in llvm::SelectionDAG::computeKnownBits(llvm::SDValue, llvm::APInt const&, unsigned int) const () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#3  0x00007ffff21943c4 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
   from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#4  0x00007ffff21942a9 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
   from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#5  0x00007ffff2191dab in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
   from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#6  0x00007ffff1fd395f in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#7  0x00007ffff1fd1457 in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#8  0x00007ffff1fce425 in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#9  0x00007ffff1f8275c in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#10 0x00007ffff1f80d4a in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#11 0x00007ffff1f7de93 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#12 0x00007ffff2177182 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#13 0x00007ffff2176a97 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#14 0x00007ffff2174acf in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#15 0x00007ffff42eeedf in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#16 0x00007ffff1d16fdb in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#17 0x00007ffff1acab6d in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#18 0x00007ffff1ad07b3 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#19 0x00007ffff1acb225 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#20 0x00007ffff345a4eb in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#21 0x00007ffff345a2f2 in LLVMTargetMachineEmitToFile () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#22 0x0000555555ccf715 in emit_to_file () at /home/julien/src/crystal-1.11.1/src/llvm/target_machine.cr:36
#23 emit_obj_to_file () at /home/julien/src/crystal-1.11.1/src/llvm/target_machine.cr:23
#24 compile_to_object () at /home/julien/src/crystal-1.11.1/src/compiler/crystal/compiler.cr:914
#25 compile () at /home/julien/src/crystal-1.11.1/src/compiler/crystal/compiler.cr:860
#26 0x0000555555ccff26 in -> () at /home/julien/src/crystal-1.11.1/src/compiler/crystal/compiler.cr:539
#27 0x00005555555ccd2b in run () at /home/julien/src/crystal-1.11.1/src/fiber.cr:146
#28 0x0000000000000000 in ?? ()

@ysbaddaden
Copy link
Contributor Author

The segfault appearing in release mode is likely because the program is spending less time in Crystal and more time in LLVM which leads to more opportunities for the thread-unsafe code in LLVM to happen in parallel and corrupt memory (leading to segfault).

I noticed that we can check if LLVM has been compiled with support for multithreading (and this patch should check for it) but I checked and my LLVM library does, so that's not the culprit (damn).

More runs through gdb would be interesting to see when it fails. It could be interesting to see if changing the ISEL mode (instruction selection) has any effect.

NOTE: to speed up the reproducibility and simplify gdb calls, we could write a simple program that loads the existing .bc files from the cache and tries to compile them in multiple threads.

@kostya
Copy link
Contributor

kostya commented Jan 15, 2024

When you build --single-module, its only 1 module, how mt can help here?. May be enable only for no --single-module.

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jan 15, 2024

@kostya no, this is when crystal has been compiled with -Dpreview_mt --single-module -O1 or another optimization level then using that compiler to compile something (e.g. crystal again) without --single-module which will trigger the parallel codegen with MT, and that will segfault.

Said differently: there are no issues with the generated binary, there is however an issue in the parallel codegen.

@crysbot
Copy link

crysbot commented Mar 4, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/choosing-cpu-for-fastest-compilation/6665/2

@ysbaddaden
Copy link
Contributor Author

For future reference:

Each module has its own LLVMContext but each module also has a reference to the main module's LLVMContext.

I'm not sure it explains the segfault in this pull request, since we dump/parse the LLVM IR from the main thread to the codegen threads into a new context, but it likely explains why the dump/parse is required.

@ysbaddaden
Copy link
Contributor Author

I think the segfault in this PR is related to a LLVM pass, for example GlobalISEL (Global Instruction Selector).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants