Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization Plans for Conv2D CPU Execution #1130

Open
abdussamettrkr opened this issue May 16, 2024 · 3 comments
Open

Optimization Plans for Conv2D CPU Execution #1130

abdussamettrkr opened this issue May 16, 2024 · 3 comments

Comments

@abdussamettrkr
Copy link
Contributor

Hello, I wonder if there are any future plans to optimize Conv2D CPU execution. I guess currently MLX uses a naive implementation?

@awni
Copy link
Member

awni commented May 20, 2024

We don't have an immediate plan for this. I'm sure there is potential to make CPU convolutions faster. I'm curious though, why not use the GPU instead?

@briancpark
Copy link

Has there been an attempt or discussion to port in BNNS conv into MLX? It's listed as TODO.
I've looked into it personally, but I'm noticing some limitations with BNNS API and MLX. For one, different format preferences. I believe BNNS prefers NCHW and OIHW, instead of NHWC and OHWI in MLX. I assume the latter is chosen as implementing the Metal variant was priority, and it has better performance properties under those formats in GPU-land.

For me, the answer for why not use GPU instead is that CPU could be more efficient. I've seen some cases in CoreML where CNNs compiled under GPU only perform as same or marginally better than CNNs compiled on CPU-only. And it could be the case the CPU is more energy efficient? (I never confirmed).

@awni
Copy link
Member

awni commented May 22, 2024

I don't think we've benchmarked the convs in BNNS but it would be interesting to see how they perform. There would have to be a copy in and out as everything in MLX assumes channels are last so that might hinder perf.

Regarding efficiency, the CPU will likely be faster (with a good implementation) for smaller models, the GPU for larger models. As for power efficiency, I do not know how they stack up and how it changes with scale.. a very good question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants