LLMs/VLMs using Llama.cpp
Builing llama.cpp
sudo apt update sudo apt install -y cmake ninja-build curl libcurl4-openssl-dev build-essentialmkdir -p ~/dev/llm # Symlink the OpenCL shared library sudo rm -f /usr/lib/libOpenCL.so sudo ln -s /lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so # OpenCL headers cd ~/dev/llm git clone https://github.com/KhronosGroup/OpenCL-Headers cd OpenCL-Headers git checkout 5d52989617e7ca7b8bb83d7306525dc9f58cdd46 mkdir -p build && cd build cmake .. -G Ninja \ -DBUILD_TESTING=OFF \ -DOPENCL_HEADERS_BUILD_TESTING=OFF \ -DOPENCL_HEADERS_BUILD_CXX_TESTS=OFF \ -DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl" cmake --build . --target install # ICD Loader cd ~/dev/llm git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader cd OpenCL-ICD-Loader git checkout 02134b05bdff750217bf0c4c11a9b13b63957b04 mkdir -p build && cd build cmake .. -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_PREFIX_PATH="$HOME/dev/llm/opencl" \ -DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl" cmake --build . --target install # Symlink OpenCL headers sudo rm -f /usr/include/CL sudo ln -s ~/dev/llm/opencl/include/CL/ /usr/include/CLcd ~/dev/llm # Clone repository git clone https://github.com/ggml-org/llama.cpp cd llama.cpp # We've tested this commit explicitly, you can try master if you want bleeding edge git checkout f6da8cb86a28f0319b40d9d2a957a26a7d875f8c # Build mkdir -p build cd build cmake .. -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_SHARED_LIBS=OFF \ -DGGML_OPENCL=ON ninja -j`nproc`cd ~/dev/llm/llama.cpp/build/bin echo "" >> ~/.bash_profile echo "# Begin llama.cpp" >> ~/.bash_profile echo "export PATH=\$PATH:$PWD" >> ~/.bash_profile echo "# End llama.cpp" >> ~/.bash_profile echo "" >> ~/.bash_profile # To use the llama.cpp files in your current session source ~/.bash_profilellama-cli --version # ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)' # ggml_opencl: device: 'QUALCOMM Adreno(TM) 635 (OpenCL 3.0 Adreno(TM) 635)' # ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 0808.0.7 Compiler E031.49.02.00 # ggml_opencl: vector subgroup broadcast support: true
Downloading and quantizing a model
Running your first LLM using llama-cli
Serving LLMs using llama-server

Serving LLMs using llama-server
Serving multi-modal LLMs

Tips & tricks
Comparing CPU performance
Last updated