Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android OpenCL question #5621

Open
anthonyliot opened this issue Feb 21, 2024 · 8 comments
Open

Android OpenCL question #5621

anthonyliot opened this issue Feb 21, 2024 · 8 comments

Comments

@anthonyliot
Copy link

anthonyliot commented Feb 21, 2024

Hi,

I was able to build a version of Llama using clblast + llama on Android. I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf

When running it seems to be working even if the output look weird and not matching the question but at least I have some response. I can have some error in my default params for sure ;)

My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?

Also I see in thecae the KF16 support is commented for OpenCL any reason ?

Thanks

@Jeximo
Copy link
Contributor

Jeximo commented Feb 21, 2024

Hi,

I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf

Unclear, but this doesn't seem to be the focus of your question.

My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?

Offloading to GPU requires the -ngl parameter, i.e. ./main ~/model.gguf -ngl 50. ReadMe

Also I see in thecae the KF16 support is commented for OpenCL any reason ?

I think f16 is not supported on Android, maybe someone will confirm.

@anthonyliot
Copy link
Author

anthonyliot commented Feb 21, 2024

@Jeximo thanks for you response, yes I use the n-GPU-layers on the params for using GPU. I used 64 in my test.

for KP16 the extension on my device exist.
Typically the KHR_FP16 is inside the list but the code to detect the extension is commented in the main branch.

on android using debug I can confirm the opencl_init is called and after removing the comment on the extension, the bool flag for the fp16 is true

But the backend remain on CPU

@anthonyliot
Copy link
Author

I finally found my problem, the ngl parameter was not pass correctly in the JNI code and was 0 because of that the device use CPU. Now I have a SIGBUS error but there is progress :)

Thanks for your help.

@Jimskns
Copy link

Jimskns commented Mar 4, 2024

How is the acceleration effect of OpenCL compared with CPU backend on Android?@anthonyliot
Thansk.

@anthonyliot
Copy link
Author

Hi @Jimskns

So the performance are not that great in OpenCL on the android device I tested. All of them was using Qualcomm OpenCL Driver. Also I made a mistake and FP16 is not supported on the device I tried.

I tried 1B / 3B / 7B on devices, and every time the CPU backend perform better. I play with mixing CPU/GPU also full GPU (not for the 7B) but in all my test for now CL is slower.

I am still looking into it to see if there is a way to get improvement on GPU

@Jimskns
Copy link

Jimskns commented Mar 5, 2024

Good Job@anthonyliot

@qtyandhasee
Copy link

qtyandhasee commented Apr 5, 2024

@anthonyliot Hello! I am also trying to use Android OpenCL recently, I also encountered the same problem as you, when using gMl-model-q4_0.gguf inference model on Android device (Qualcomm) GPU can get confusing output, I was wondering how you finally solved this problem? Looking forward to your reply, thank you very much

@gustrd
Copy link
Contributor

gustrd commented May 11, 2024

Pointing another thread discussing this topic: #7016

@github-actions github-actions bot removed the stale label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants