-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android OpenCL question #5621
Comments
Unclear, but this doesn't seem to be the focus of your question.
Offloading to GPU requires the
I think f16 is not supported on Android, maybe someone will confirm. |
@Jeximo thanks for you response, yes I use the n-GPU-layers on the params for using GPU. I used 64 in my test. for KP16 the extension on my device exist. on android using debug I can confirm the opencl_init is called and after removing the comment on the extension, the bool flag for the fp16 is true But the backend remain on CPU |
I finally found my problem, the ngl parameter was not pass correctly in the JNI code and was 0 because of that the device use CPU. Now I have a SIGBUS error but there is progress :) Thanks for your help. |
How is the acceleration effect of OpenCL compared with CPU backend on Android?@anthonyliot |
Hi @Jimskns So the performance are not that great in OpenCL on the android device I tested. All of them was using Qualcomm OpenCL Driver. Also I made a mistake and FP16 is not supported on the device I tried. I tried 1B / 3B / 7B on devices, and every time the CPU backend perform better. I play with mixing CPU/GPU also full GPU (not for the 7B) but in all my test for now CL is slower. I am still looking into it to see if there is a way to get improvement on GPU |
Good Job@anthonyliot |
@anthonyliot Hello! I am also trying to use Android OpenCL recently, I also encountered the same problem as you, when using gMl-model-q4_0.gguf inference model on Android device (Qualcomm) GPU can get confusing output, I was wondering how you finally solved this problem? Looking forward to your reply, thank you very much |
Pointing another thread discussing this topic: #7016 |
Hi,
I was able to build a version of Llama using clblast + llama on Android. I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf
When running it seems to be working even if the output look weird and not matching the question but at least I have some response. I can have some error in my default params for sure ;)
My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?
Also I see in thecae the KF16 support is commented for OpenCL any reason ?
Thanks
The text was updated successfully, but these errors were encountered: