Whisper
Whisper is OpenAI's general-purpose automatic speech recognition (ASR) model. You can use it for audio transcription, translation, and language identification. You can run Whisper on the NPU of your Dragonwing development board using Qualcomm's VoiceAI ASR, or on the CPU using whisper.cpp.
Running Whisper on the NPU with VoiceAI ASR
TODO: VoiceAI ASR is only downloadable from x86 Linux desktop systems - need to fix this, or get permission to redistribute on our CDN.
Open a terminal on your development board, and set up the base requirements for this example:
sudo apt install -y cmake pulseaudio-utils
Install the AI Runtime SDK - Community Edition:
wget -qO- https://cdn.edgeimpulse.com/qc-ai-docs/device-setup/install_ai_runtime_sdk_2.35.sh | bash
Install VoiceAI ASR:
cd ~/ # https://softwarecenter.qualcomm.com/catalog/item/VoiceAI_ASR, temp mirrored here for devrel purposes, TOOD: remove before launch wget https://cdn.edgeimpulse.com/qc-ai-docs/sdk/VoiceAI_ASR_2.1.0.0.zip unzip VoiceAI_ASR_2.1.0.0.zip -d voiceai_asr cd voiceai_asr/2.1.0.0/ # Put the path to VoiceAI ASR in your bash_profile (so it's available under VOICEAI_ROOT) echo "" >> ~/.bash_profile echo "# Begin VoiceAI ASR" >> ~/.bash_profile echo "export VOICEAI_ROOT=$PWD" >> ~/.bash_profile echo "# End VoiceAI ASR" >> ~/.bash_profile echo "" >> ~/.bash_profile # Re-load the environment variables source ~/.bash_profile # Symlink Whisper libraries cd $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/linux/whisper_all_quantized/ sudo ln -s $PWD/*.so /usr/lib/
Build the
voice-ai-ref
example:cd $VOICEAI_ROOT/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref # overwrite the main.cpp example wget -O src/main.cpp https://cdn.edgeimpulse.com/qc-ai-docs/code/voiceai_ref_2.1.0.0_main.cpp # Symlink Whisper libraries for build mkdir -p libs/arm64-v8a/ cd libs/arm64-v8a/ ln -s $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/linux/whisper_all_quantized/*.so . cd ../../ mkdir -p build cd build cmake .. make -j`nproc`
Download a precompiled Whisper model for the NPU:
mkdir -p ~/whisper_models/model_qnn_226/ cd ~/whisper_models/model_qnn_226/ ln -s $QAIRT_SRC_ROOT/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so . ln -s $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/assets/speech_float.eai . # TODO: Download decoder_model_htp.bin / encoder_model_htp.bin / vocab.bin (ask Jan)
TODO: decoder_model_htp.bin / encoder_model_htp.bin / vocab.bin need to be in AI Hub.
You can now transcribe WAV files:
cd $VOICEAI_ROOT/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref/build # Download sample file wget -O jfk.wav https://raw.githubusercontent.com/ggml-org/whisper.cpp/refs/heads/master/samples/jfk.wav # Transcribe: ./voice-ai-ref -f jfk.wav -l en -t transcribe -m ~/whisper-models/model_qnn_226/ | grep -v "No usable logger handle was found" | grep -v "Logs will be sent to" # ... Expected result: # VoiceAIRef final result = And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. [language: English]
TODO: Figure out where these No usable logger handle was found
/ Logs will be sent to
messages come from.
TODO: Can't figure out how to quit the application once transcription is complete.
Or even do live transcription:
Connect a microphone to your development board.
Find the name of your microphone:
pactl list short sources # 49 alsa_output.platform-sound.stereo-fallback.monitor PipeWire s24-32le 2ch 48000Hz SUSPENDED # 76 alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo PipeWire s16le 2ch 32000Hz SUSPENDED # To use the USB webcam, use "alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo" as the name
Run live transcription:
./voice-ai-ref -r -l en -t transcribe -m ~/whisper-models/model_qnn_226/ -d "alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo" | grep -v "No usable logger handle was found" | grep -v "Logs will be sent to" # VoiceAIRef final result = Hi, this is to see if I can do live transcription on my Rubik Pi. [language: English]
🚀 You now have fully offline transcription of audio on your development board! VoiceAI ASR does not have bindings to higher level languages (like Python), so if you want to use Whisper in your application it's easiest to just spawn the
voice-ai-ref
binary, and read data from stdout.
Running Whisper on the CPU with whisper.cpp
Alternatively you can run Whisper on the CPU (with less performance) using whisper.cpp (or any of the other popular Whisper libraries).
Here's instructions for whisper.cpp. Open the terminal on your development board, or an ssh session to your development board, and run:
Install build dependencies:
sudo apt update sudo apt install -y libsdl2-dev libsdl2-2.0-0 libasound2-dev
Build whisper.cpp:
mkdir -p ~/dev/llm/ cd ~/dev/llm/ git clone https://github.com/ggml-org/whisper.cpp.git cd whisper.cpp git checkout v1.7.6 # Build (CPU) cmake -B build-cpu -DWHISPER_SDL2=ON cmake --build build-cpu -j`nproc` --config Release
Add the whisper.cpp paths to your PATH:
cd ~/dev/llm/whisper.cpp/build-cpu/bin echo "" >> ~/.bash_profile echo "# Begin whisper.cpp" >> ~/.bash_profile echo "export PATH=\$PATH:$PWD" >> ~/.bash_profile echo "# End whisper.cpp" >> ~/.bash_profile echo "" >> ~/.bash_profile # To use the whisper.cpp files in your current session source ~/.bash_profile
You now transcribe some audio using whisper.cpp:
# Download model cd ~/dev/llm/whisper.cpp sh ./models/download-ggml-model.sh tiny.en-q5_1 # Transcribe text whisper-cli -m models/ggml-tiny.en-q5_1.bin -f samples/jfk.wav # [00:00:00.000 --> 00:00:10.480] # and so my fellow Americans ask not what your country can do for you ask what you can do for your country
You can also live transcribe audio:
Connect a microphone to your development board.
Find your microphone ID:
SDL_AUDIODRIVER=alsa whisper-stream -m models/ggml-tiny.en-q5_1.bin # init: found 2 capture devices: # init: - Capture device #0: 'qcm6490-rb3-vision-snd-card, ' # init: - Capture device #1: 'Yeti Stereo Microphone, USB Audio' # If you want "Yeti Stereo Microphone, USB Audio" then the ID is 1
Start live transcribing:
SDL_AUDIODRIVER=alsa whisper-stream -m models/ggml-tiny.en-q5_1.bin -c 1 # main: processing 48000 samples (step = 3.0 sec / len = 10.0 sec / keep = 0.2 sec), 4 threads, lang = en, task = transcribe, timestamps = 0 ... # main: n_new_line = 2, no_context = 1 # # [Start speaking] # This is a test to see if you can transcribe text live on your Qualcomm device
Running on the GPU with OpenCL
You can build binaries that run on the GPU too via:
First follow the steps in llama.cpp under "Install the OpenCL headers and ICD loader library".
Build a binary with OpenCL:
cd ~/dev/llm/whisper.cpp cmake -B build-gpu -DGGML_OPENCL=ON -DWHISPER_SDL2=ON cmake --build build-gpu -j`nproc` --config Release # Find the binary in: # build-gpu/bin/whisper-cli
But this does not run faster than on CPU, at least on QCS6490 (even with Q4_0 quantized weights).
Last updated