Comms #1097

angeloskath · 2024-05-09T21:38:21Z

Beginning of communication namespace (perhaps it should be named comms instead of dist). This is mostly to get feedback while implementing the rest of the primitives and figuring out how to package this in the distribution.

Interesting bits:

mlx::core::dist defines a bunch of functions that are optionally implemented by a communication backend. Currently mpi.
This defines a Stream communication_stream() and all communication operations go in that CPU stream.
Primitives have transformations defined as expected which means we can write model parallel code with minimal fuss. Whenever sth needs to be communicated just communicate and gradients will flow accordingly. (I have to fix the gradient for all reduce sum but when everything is done it should be easy to use).

awni · 2024-05-10T13:40:19Z

I might prefer the name comm or comms over dist.

I also think distributed is fine.. and perhaps better since it is what every other package uses. One can always make a short name ..

mlx/dist/mpi/mpi.cpp

awni · 2024-05-10T13:55:26Z

mlx/dist/primitives.cpp

+  auto ensure_row_contiguous = [](const array& arr) {
+    if (arr.flags().row_contiguous) {
+      return arr;
+    } else {
+      array arr_copy(arr.shape(), arr.dtype(), nullptr, {});
+      copy(arr, arr_copy, CopyType::General);
+      return arr_copy;
+    }
+  };


I wonder if there is a better strategy for this. The CPU copy could be kind of slow / might be better to use a GPU copy prior to the comm.

I wonder if we should consider adding a ensure_contiguous(inputs) op (which is meant for internal use only), but actually puts the copy in the graph if its needed.

mlx/dist/dist.h

awni · 2024-05-10T14:08:40Z

mlx/dist/dist.h

+struct Group {
+  virtual int rank() = 0;
+  virtual int size() = 0;
+  virtual std::shared_ptr<Group> split(int n) = 0;
+};


I don't think we plan to compile with multiple communication backends simultaneously right?

In that case, it might be cleaner from a user perspective to make Group non-virtual and give it a payload which is like the implementation specific bit. Kind of like how Event / Buffer are implemented.

Just a thought to keep the shared pointers out of the interface..

Not sure why that way would be better 🤷‍♂️ but I implemented it. It is pretty much the same thing as it just forces us to write the MPIGroup in mpi.cpp (as we would) and hide it behind a std::shared_ptr<void>. It is still pretty clean if not a little bit more cryptic and hard to follow and also forces us to have only one type of group at any given point.

I guess I'm not a big fan of the pattern of virtualization to change behavior at compile-time, feels like the wrong tool for the job. Maybe there is another option that is cleaner than the payload (which is indeed a bit cryptic).

Also it's meant as a suggestion.. if you think it's a lot more readable in the previous version feel free to revert it.

awni · 2024-05-10T14:13:04Z

This is very nice and simple! Looks great!

fishelegs · 2024-05-14T12:23:38Z

Is there any example?
How to use it to train/finetune or inference?
Thanks!

awni · 2024-05-23T18:53:54Z

.gitignore

+# Negate mlx/dist
+!mlx/dist


Nit: don't need that anymore

awni · 2024-05-23T18:54:09Z

CMakeLists.txt

@@ -167,6 +167,11 @@ else()
  set(MLX_BUILD_ACCELERATE OFF)
 endif()

+find_package(MPI)


Is this still needed now that you do it dynamically?

Well it depends :-) We can skip it and then I 'd add an mpi.h defining all the functions that I am using. It is basically finding the header.

Functions are fine actually, the types need defining.

I see, let's leave it then! It should still build without MPI installed which is good.

awni · 2024-05-23T18:59:13Z

mlx/distributed/CMakeLists.txt

+if (MPI_FOUND)
+  add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/mpi)
+else()
+  target_sources(
+    mlx
+    PRIVATE
+    ${CMAKE_CURRENT_SOURCE_DIR}/no_distributed.cpp
+  )
+endif()


I'm a little confused by the building cases. If MPI is not available we build without it. But we also do linking at run time. Does it make sense to have a no distributed option in that case?

Another idea is that maybe we should build no_distributed if MLX_BUILD_CPU=OFF (rather than throwing in the copy function. It does not seem too odd to me to disable MPI if the CPU is disabled..

Yeah! Very good point. I will do that and move the copy inside the mpi implementation where it belongs.

angeloskath · 2024-05-23T19:48:58Z

Maan that was a very nice suggestion. It feels so much better now with no_cpu.cpp removed and the copy moved to the distributed implementation.

awni

🚀

lin72h · 2024-05-25T04:23:35Z

This is huge, wish someone could write a tutorial of how to connect 2 Macs use MLX

awni · 2024-05-25T04:58:45Z

Usage docs coming soon!

sck-at-ucy · 2024-05-25T05:25:21Z

I can't wait to try this out!!

altaic · 2024-05-25T05:34:15Z

Usage docs coming soon!

Awesome work, so excited for this! Any idea how much throughput will be necessary for various use cases? Also, can MPI aggregate Thunderbolt links?

* Start the communications branch using MPI * Add ops and primitives * Add python bindings for distributed

angeloskath changed the title ~~Comms~~ [WIP] Comms May 9, 2024

awni reviewed May 10, 2024

View reviewed changes

mlx/dist/mpi/mpi.cpp Outdated Show resolved Hide resolved

awni reviewed May 10, 2024

View reviewed changes

mlx/dist/dist.h Outdated Show resolved Hide resolved

awni reviewed May 10, 2024

View reviewed changes

awni mentioned this pull request May 10, 2024

[Feature] Multi-Machine Support for Distributed Inference #1046

Open

angeloskath force-pushed the comms branch 3 times, most recently from a43c554 to 7405b52 Compare May 11, 2024 04:49

angeloskath force-pushed the comms branch 5 times, most recently from 30501e0 to e5f0e46 Compare May 21, 2024 21:28

angeloskath added 13 commits May 22, 2024 22:00

Start the communications branch

56943b2

Add ops and primitives

353cacf

Add python bindings for dist

3e9b114

Rename dist to distributed

be16f8f

Apply PR comments

e177f6e

Add an all gather operation

7e62a0b

Make MPI optional and add a no distributed implementation

e716417

Load mpi at runtime

a04af4e

Add group splitting in distributed

1fd5a5c

Move the implementations into the detail namespace

04cc14d

Group to value object with impl-specific payload

c7b19dd

Fix the example

603328a

Add missing optional header

5f5a8d9

Add distributed ops to mlx.h

9331961

angeloskath force-pushed the comms branch from e5f0e46 to 42a457d Compare May 23, 2024 06:43

Add tests

666f253

angeloskath force-pushed the comms branch from 42a457d to 666f253 Compare May 23, 2024 06:53

angeloskath changed the title ~~[WIP] Comms~~ Comms May 23, 2024

Fix compilation when there is no CPU backend

845868c

angeloskath force-pushed the comms branch from 60f2e80 to 845868c Compare May 23, 2024 07:48

angeloskath marked this pull request as ready for review May 23, 2024 07:48

awni reviewed May 23, 2024

View reviewed changes

angeloskath added 2 commits May 23, 2024 12:38

Remove mlx/dist from gitignore

1548137

Remove need for no_cpu

683aed3

awni approved these changes May 23, 2024

View reviewed changes

angeloskath merged commit 50dfb66 into main May 24, 2024
3 checks passed

angeloskath deleted the comms branch May 24, 2024 00:04

jkaercher pushed a commit to jkaercher/mlx that referenced this pull request May 30, 2024

Comms (ml-explore#1097)

746c90d

* Start the communications branch using MPI * Add ops and primitives * Add python bindings for distributed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comms #1097

Comms #1097

angeloskath commented May 9, 2024

awni commented May 10, 2024

awni May 10, 2024

awni May 10, 2024 •

edited

angeloskath May 21, 2024

awni May 21, 2024

awni commented May 10, 2024

fishelegs commented May 14, 2024

awni May 23, 2024

awni May 23, 2024

angeloskath May 23, 2024

angeloskath May 23, 2024

awni May 23, 2024

awni May 23, 2024

angeloskath May 23, 2024

angeloskath commented May 23, 2024

awni left a comment

lin72h commented May 25, 2024

awni commented May 25, 2024

sck-at-ucy commented May 25, 2024 •

edited

altaic commented May 25, 2024

Comms #1097

Comms #1097

Conversation

angeloskath commented May 9, 2024

awni commented May 10, 2024

Choose a reason for hiding this comment

awni May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awni commented May 10, 2024

fishelegs commented May 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angeloskath commented May 23, 2024

awni left a comment

Choose a reason for hiding this comment

lin72h commented May 25, 2024

awni commented May 25, 2024

sck-at-ucy commented May 25, 2024 • edited

altaic commented May 25, 2024

awni May 10, 2024 •

edited

sck-at-ucy commented May 25, 2024 •

edited