Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make in Termux (Android) #247

Closed
ghost opened this issue Jun 17, 2023 · 14 comments
Closed

Make in Termux (Android) #247

ghost opened this issue Jun 17, 2023 · 14 comments

Comments

@ghost
Copy link

ghost commented Jun 17, 2023

Hi,

I'm trying to build kobold concedo with make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1, but it fails.

Details

u0_a1282@localhost ~> cd koboldcpp/ u0_a1282@localhost ~/koboldcpp (concedo)> make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 I llama.cpp build info: I UNAME_S: Linux I UNAME_P: unknown I UNAME_M: aarch64 I CFLAGS: -I. -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11 -fPIC -DGGML_USE_K_QUANTS -pthread -s -pthread I CXXFLAGS: -I. -I./examples -I./include -I./include/CL -I./otherarch -I./otherarch/tools -O3 -DNDEBUG -std=c++11 -fPIC -pthread -s -Wno-multichar -Wno-write-strings -pthread I LDFLAGS: I CC: clang version 16.0.6 I CXX: clang version 16.0.6

aarch64-linux-android-clang++ -I. -I./examples -I./include -I./include/CL -I./otherarch -I./otherarch/tools -O3 -DNDEBUG -std=c++11 -fPIC -pthread -s -Wno-multichar -Wno-write-strings -pthread ggml.o ggml_v2.o ggml_v1.o expose.o common.o gpttype_adapter.o k_quants.o -shared -o koboldcpp.so
aarch64-linux-android-clang++ -I. -I./examples -I./include -I./include/CL -I./otherarch -I./otherarch/tools -O3 -DNDEBUG -std=c++11 -fPIC -pthread -s -Wno-multichar -Wno-write-strings -pthread ggml_failsafe.o ggml_v2_failsafe.o ggml_v1_failsafe.o expose.o common.o gpttype_adapter_failsafe.o k_quants_failsafe.o -shared -o koboldcpp_failsafe.so
aarch64-linux-android-clang++ -I. -I./examples -I./include -I./include/CL -I./otherarch -I./otherarch/tools -O3 -DNDEBUG -std=c++11 -fPIC -pthread -s -Wno-multichar -Wno-write-strings -pthread ggml_openblas.o ggml_v2_openblas.o ggml_v1.o expose.o common.o gpttype_adapter.o k_quants.o -lopenblas -shared -o koboldcpp_openblas.so
aarch64-linux-android-clang++ -I. -I./examples -I./include -I./include/CL -I./otherarch -I./otherarch/tools -O3 -DNDEBUG -std=c++11 -fPIC -pthread -s -Wno-multichar -Wno-write-strings -pthread ggml_openblas_noavx2.o ggml_v2_openblas_noavx2.o ggml_v1_failsafe.o expose.o common.o gpttype_adapter.o k_quants_noavx2.o -lopenblas -shared -o koboldcpp_openblas_noavx2.so
aarch64-linux-android-clang++ -I. -I./examples -I./include -I./include/CL -I./otherarch -I./otherarch/tools -O3 -DNDEBUG -std=c++11 -fPIC -pthread -s -Wno-multichar -Wno-write-strings -pthread ggml_clblast.o ggml_v2_clblast.o ggml_v1.o expose.o common.o gpttype_adapter_clblast.o ggml-opencl.o ggml_v2-opencl.o ggml_v2-opencl-legacy.o k_quants.o -lclblast -lOpenCL -lopenblas -shared -o koboldcpp_clblast.so
ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap<std::__ndk1::complex>(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_command_queue**, std::__ndk1::default_delete<_cl_command_queue*>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'vtable for std::__ndk1::__shared_ptr_pointer<_cl_mem**, std::__ndk1::default_delete<_cl_mem*>, std::__ndk1::allocator<_cl_mem*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
referenced by clblast.cpp
clblast.cpp.o:(clblast::StatusCode clblast::Swap(unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_mem*, unsigned long, unsigned long, _cl_command_queue**, _cl_event**)) in archive /data/data/com.termux/files/usr/lib/libclblast.a

ld.lld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors)
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:326: koboldcpp_clblast] Error 1
u0_a1282@localhost ~/koboldcpp (concedo) [2]>

To clarify, using make to build with OpenBlas works as expected, but building CLBlast fails.

More context: when I build something like llama.cpp, then I must use CMake otherwise it fails locate the CLBlast library, but I remember seeing an error saying don't build like that. Does that still apply to this situation?

Here's clinfo;

u0_a1282@localhost ~> LD_LIBRARY_PATH=/vendor/lib64 clinfo
Number of platforms                               1
  Platform Name                                   QUALCOMM Snapdragon(TM)
  Platform Vendor                                 QUALCOMM
  Platform Version                                OpenCL 2.0 QUALCOMM build: commit #3dad7f8ed7 changeid #I593c16c433 Date: 10/01/21 Fri Local Branch:  Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.UM.9.1.R1.11.00.00.604.073
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             

  Platform Name                                   QUALCOMM Snapdragon(TM)
Number of devices                                 1
  Device Name                                     QUALCOMM Adreno(TM)
  Device Vendor                                   QUALCOMM
  Device Vendor ID                                0x5143
  Device Version                                  OpenCL 2.0 Adreno(TM) 640
  Driver Version                                  OpenCL 2.0 QUALCOMM build: commit #3dad7f8ed7 changeid #I593c16c433 Date: 10/01/21 Fri Local Branch:  Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.UM.9.1.R1.11.00.00.604.073 Compiler E031.37.12.01
  Device OpenCL C Version                         OpenCL C 2.0 Adreno(TM) 640
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             1MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple (kernel)     128
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 0
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3911952384 (3.643GiB)
  Error Correction support                        No
  Max memory allocation                           977988096 (932.7MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Page size (QCOM)                                4096 bytes
  External memory padding (QCOM)                  0 bytes
  Preferred alignment for atomics
    SVM                                           128 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             1048576 (1024KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072 (128KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   64 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    7680
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                655376 (640KiB)
    Max size                                      655376 (640KiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_3d_image_writes cl_img_egl_image cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_mipmap_image cl_khr_srgb_image_writes cl_khr_subgroups cl_qcom_create_buffer_from_image cl_qcom_ext_host_ptr cl_qcom_ion_host_ptr cl_qcom_perf_hint cl_qcom_other_image cl_qcom_subgroup_shuffle cl_qcom_vector_image_ops cl_qcom_extract_image_plane cl_qcom_android_native_buffer_host_ptr cl_qcom_protected_context cl_qcom_priority_hint cl_qcom_compressed_yuv_image_read cl_qcom_compressed_image cl_qcom_ext_host_ptr_iocoherent cl_qcom_accelerated_image_ops cl_qcom_ml_ops

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [P0]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 QUALCOMM Snapdragon(TM)
    Device Name                                   QUALCOMM Adreno(TM)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 QUALCOMM Snapdragon(TM)
    Device Name                                   QUALCOMM Adreno(TM)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 QUALCOMM Snapdragon(TM)
    Device Name                                   QUALCOMM Adreno(TM)
u0_a1282@localhost ~>

Thank you!

@gustrd
Copy link

gustrd commented Jun 17, 2023

I wrote a guide of how to build LlamaCpp at Termux. KoboldCpp is very similar.

It's currently at PR at LlamaCpp's GitHub repo. You can check it there.

@LostRuins , you find it would be good bring a similar version to Koboldcpp's Readme also?

I had a lot of fun playing with KoboldCpp and SillyTavern at an airplane flight.

@gustrd
Copy link

gustrd commented Jun 17, 2023

Just checked and it was merged today: ggerganov#1828 (review)

@ghost
Copy link
Author

ghost commented Jun 17, 2023

Just checked and it was merged today: ggerganov#1828 (review)

I'll try it, thanks. I'll reply and let you know if kobold.cpp builds as expected.

@LostRuins
Copy link
Owner

@gustrd i'll add a hotlink to that document

@ghost
Copy link
Author

ghost commented Jun 18, 2023

@gustrd i'll add a hotlink to that document

After ensuring CLBlast is installed, I use

<cp ./include/clblast.h ~/koboldcpp>

I navigate to koboldcpp then use..
<cp /data/data/com.termux/files/usr/include/openblas/cblas.h .>
<cp /data/data/com.termux/files/usr/include/openblas/openblas_config.h .>

It compiles as expected, but it doesn't run with CLBlast.

u0_a1282@localhost ~> cd koboldcpp/
u0_a1282@localhost ~/koboldcpp (concedo)> LD_LIBRARY_PATH=/vendor/lib64:$PREFIX/lib python ~/koboldcpp/koboldcpp.py ~/llama.cpp/models/lima-q5_1.bin --threads 3 --useclblast 0 0
Welcome to KoboldCpp - Version 1.31
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.so
Traceback (most recent call last):
  File "/data/data/com.termux/files/home/koboldcpp/koboldcpp.py", line 850, in <module>
    main(args)
  File "/data/data/com.termux/files/home/koboldcpp/koboldcpp.py", line 745, in main
    init_library() # Note: if blas does not exist and is enabled, program will crash.
    ^^^^^^^^^^^^^^
  File "/data/data/com.termux/files/home/koboldcpp/koboldcpp.py", line 134, in init_library
    handle = ctypes.CDLL(os.path.join(dir_path, libname))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/data/com.termux/files/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen failed: cannot locate symbol "__emutls_get_address" referenced by "/data/data/com.termux/files/home/koboldcpp/koboldcpp_clblast.so"...
u0_a1282@localhost ~/koboldcpp (concedo) [1]>

Using gustrds method, then I get no a platform error i.e.:

GGML_OPENCL_PLATFORM=0
GGML_OPENCL_DEVICE=0
export LD_LIBRARY_PATH=/system/vendor/lib64:$LD_LIBRARY_PATH
python ...

The method belpw allows a platform, but koboldcpp crashes.

LD_LIBRARY_PATH=/vendor/lib64:$PREFIX/lib python ..

Thanks.

@gustrd
Copy link

gustrd commented Jun 18, 2023

clinfo is working? What is it's output?

@gustrd
Copy link

gustrd commented Jun 18, 2023

Also make sure you are using the latest Termux, unrooted, from F-Droid. It was only tested with it.

The one from Play Store has several issues.

@ghost
Copy link
Author

ghost commented Jun 18, 2023

Also make sure you are using the latest Termux, unrooted, from F-Droid. It was only tested with it.

The one from Play Store has several issues.

I'm aware, thanks.

clinfo does not produce a platform

clinfo is working? What is it's output?

Brother, please read my message:

LD_LIBRARY_PATH=/vendor/lib64:$PREFIX/lib is required to enable OpenCL through CLBlast on my device.

The method: GGML_OPENCL_PLATFORM 0 GGML_OPENCL_DEVICE 0 export LD_LIBRARY_PATH=/system/vendor/lib64:$LD_LIBRARY_PATH is an unresponsive instruction - it submits without approval or error.

Trying commands like, export LD_LIBRARY_PATH=/system/vendor/lib64:$LD_LIBRARY_PATH, and LD_LIBRARY_PATH=/system/vendor/lib64:$LD_LIBRARY_PATH do not change anything. They do not enable clinfo to receive the platform:

LD_LIBRARY_PATH=/system/vendor/lib64:$LD_LIBRARY_PATH clinfo          
Number of platforms                               0                                                                                                   ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1                                               ICD loader Profile                              OpenCL 3.0

The method LD_LIBRARY_PATH=/vendor/lib64:$PREFIX/lib clinfo alows OpenCL to function:

Number of platforms                               1                                                   Platform Name                                   QUALCOMM Snapdragon(TM)                             Platform Vendor                                 QUALCOMM                                            Platform Version                                OpenCL 2.0 QUALCOMM build: commit #3dad7f8ed7 changeid #I593c16c433 Date: 10/01/21 Fri Local Branch:  Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.UM.9.1.R1.11.00.00.604.073
  Platform Profile                                FULL_PROFILE                                        Platform Extensions                                                                               
  Platform Name                                   QUALCOMM Snapdragon(TM)
Number of devices                                 1
  Device Name                                     QUALCOMM Adreno(TM)                                 Device Vendor                                   QUALCOMM                                            Device Vendor ID                                0x5143                                              Device Version                                  OpenCL 2.0 Adreno(TM) 640                           Driver Version                                  OpenCL 2.0 QUALCOMM build: commit #3dad7f8ed7 changeid #I593c16c433 Date: 10/01/21 Fri Local Branch:  Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.UM.9.1.R1.11.00.00.604.073 Compiler E031.37.12.01      Device OpenCL C Version                         OpenCL C 2.0 Adreno(TM) 640                         Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             1MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple (kernel)     128
  Preferred / native vector sizes                     char                                                 1 / 1                                          short                                                1 / 1                                          int                                                  1 / 1
    long                                                 1 / 0                                          half                                                 1 / 1        (cl_khr_fp16)                     float                                                1 / 1
    double                                               0 / 0        (n/a)                           Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             Yes                                                   IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No                                                    Round to infinity                             Yes                                                   IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No                                                    Correctly-rounded divide and sqrt operations  No                                                  Double-precision Floating-point support         (n/a)                                               Address bits                                    64, Little-Endian                                   Global memory size                              3911952384 (3.643GiB)
  Error Correction support                        No
  Max memory allocation                           977988096 (932.7MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes                                                   Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No                                                    Atomics                                       Yes                                                 Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)                               Page size (QCOM)                                4096 bytes                                          External memory padding (QCOM)                  0 bytes                                             Preferred alignment for atomics                     SVM                                           128 bytes
    Global                                        0 bytes                                               Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)                                       Preferred total size of global vars             1048576 (1024KiB)                                   Global Memory cache type                        Read/Write                                          Global Memory cache size                        131072 (128KiB)                                     Global Memory cache line size                   64 bytes                                            Image support                                   Yes                                                   Max number of samplers per kernel             16                                                    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   64 bytes                                              Pitch alignment for 2D image buffers          64 pixels                                             Max 2D image size                             16384x16384 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    7680                                                Max pipe packet size                            1024                                                Local memory type                               Local                                               Local memory size                               32768 (32KiB)
  Max number of constant args                     8                                                   Max constant buffer size                        65536 (64KiB)                                       Max size of kernel argument                     1024
  Queue properties (on host)
    Out-of-order execution                        Yes                                                   Profiling                                     Yes                                                 Queue properties (on device)                        Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                655376 (640KiB)
    Max size                                      655376 (640KiB)
  Max queues on device                            1                                                   Max events on device                            1024                                                Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns                                              Execution capabilities
    Run OpenCL kernels                            Yes                                                   Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)                                               Device Extensions                               cl_khr_3d_image_writes cl_img_egl_image cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_mipmap_image cl_khr_srgb_image_writes cl_khr_subgroups cl_qcom_create_buffer_from_image cl_qcom_ext_host_ptr cl_qcom_ion_host_ptr cl_qcom_perf_hint cl_qcom_other_image cl_qcom_subgroup_shuffle cl_qcom_vector_image_ops cl_qcom_extract_image_plane cl_qcom_android_native_buffer_host_ptr cl_qcom_protected_context cl_qcom_priority_hint cl_qcom_compressed_yuv_image_read cl_qcom_compressed_image cl_qcom_ext_host_ptr_iocoherent cl_qcom_accelerated_image_ops cl_qcom_ml_ops
                                                  NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform                                         clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform                                         clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [P0]                                        clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 QUALCOMM Snapdragon(TM)                               Device Name                                   QUALCOMM Adreno(TM)                                 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform                     clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 QUALCOMM Snapdragon(TM)
    Device Name                                   QUALCOMM Adreno(TM)                                 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform             clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)                                        Platform Name                                 QUALCOMM Snapdragon(TM)
    Device Name                                   QUALCOMM Adreno(TM)

I don't need help running llama.cpp. I came to this repository for assistance with koboldcpp.

@gustrd
Copy link

gustrd commented Jun 18, 2023

That's strange, because at my device KoboldCpp runs correctly with the same instructions:

~/koboldcpp $ export LD_LIBRARY_PATH=/system/vendor/lib64:$LD_LIBRARY_PATH
~/koboldcpp $ ./sP.sh                                   Welcome to KoboldCpp - Version 1.30.3
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.so
==========
Loading model: /data/data/com.termux/files/home/models/pygmalion-7b-ggml-q5_K.bin
[Threads: 2, BlasThreads: 2, SmartContext: True]

---
Identified as LLAMA model: (ver 5)
Attempting to Load...
---
System Info: AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
llama.cpp: loading model from /data/data/com.termux/files/home/models/pygmalion-7b-ggml-q5_K.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 17 (mostly Q5_K - Medium)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB

Platform:0 Device:0  - QUALCOMM Snapdragon(TM) with QUALCOMM Adreno(TM)

ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM)'
ggml_opencl: device FP16 support: true
CL FP16 temporarily disabled pending further optimization.
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 6384.94 MB (+ 1026.00 MB per state)
llama_model_load_internal: offloading 0 layers to GPU
llama_model_load_internal: total VRAM used: 0 MB
..................................................................................................
llama_init_from_file: kv self size  = 1024.00 MB
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001
127.0.0.1 - - [18/Jun/2023 08:57:06] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [18/Jun/2023 08:57:06] "GET /api/v1/model HTTP/1.1" 200 -
127.0.0.1 - - [18/Jun/2023 08:57:06] "GET /api/v1/info/version HTTP/1.1" 200 -
127.0.0.1 - - [18/Jun/2023 08:57:06] "GET /sw.js HTTP/1.1" 404 -
127.0.0.1 - - [18/Jun/2023 08:57:06] "GET /manifest.json HTTP/1.1" 404 -
127.0.0.1 - - [18/Jun/2023 08:57:06] "GET /api/extra/version HTTP/1.1" 200 -

Input: {"n": 1, "max_context_length": 1024, "max_length": 20, "rep_pen": 1.08, "temperature": 0.7, "top_p": 0.92, "top_k": 0, "top_a": 0, "typical": 1, "tfs": 1, "rep_pen_range": 256, "rep_pen_slope": 0.7, "sampler_order": [6, 0, 1, 2, 3, 4, 5], "prompt": "Rebecca was a little  nervous about going to the doctor today.", "quiet": true}

Processing Prompt (17 / 17 tokens)
Generating (20 / 20 tokens)
Time Taken - Processing:18.6s (1094ms/T), Generation:68.4s (3421ms/T), Total:87.0s
Output:  She had never met anyone who  looked exactly like her, and she was afraid he would be angry
127.0.0.1 - - [18/Jun/2023 08:58:40] "POST /api/v1/generate/ HTTP/1.1" 200 -

Good luck.

@ghost
Copy link
Author

ghost commented Jun 18, 2023

It's something else to see the way that PR slipped by without much testing. Then koboldcpp adopted it without question.

Anyway, there's no response about building with CMake, so there's nothing else to troubleshoot.

@ghost ghost closed this as completed Jun 18, 2023
@yassinehub12
Copy link

لقد كتبت دليلًا عن كيفية بناء LlamaCpp في Termux. KoboldCpp مشابه جدًا.

إنه حاليًا في العلاقات العامة في LlamaCpp's GitHub repo. يمكنك التحقق من ذلك هناك.

@LostRuins، هل تجد أنه سيكون من الجيد إحضار نسخة مماثلة لـ Koboldcpp's Readme أيضًا؟

لقد استمتعت كثيرًا باللعب مع KoboldCpp و SillyTavern في رحلة طائرة.

Yeees exactly 💯

@smkw33d
Copy link

smkw33d commented Jan 27, 2024

Can confirm this issue still exists on newest koboldcpp. Device: POCO F5

@LostRuins
Copy link
Owner

To anyone still having trouble building with Termux, try following the step-by-step build instructions on the Readme. That should at minimum get it up and running. These instructions do not cover GPU acceleration - for that you are on your own.

@gustrd
Copy link

gustrd commented May 12, 2024

Some new references that sadly CLBlast in Android is buggy: ggerganov#7016

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants